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Abstract — In this paper, aiming at securing range query, top-k 
query, and skyline query in tiered sensor networks, we propose 
the Secure Range Query (SRQ), Secure Top-fc Query (STQ), and 
Secure Skyline Query (SSQ) schemes, respectively. In particular, 
SRQ, by using our proposed prime aggregation technique, has 
the lowest communication overhead among prior works, while 
STQ and SSQ, to our knowledge, are the first proposals in 
tiered sensor networks for securing top-fc and skyline queries, 
respectively. Moreover, the relatively unexplored issue of the 
security impact of sensor node compromises on multidimensional 
queries is studied; two attacks incurred from the sensor node 
compromises, collusion attack and false-incrimination attack, are 
investigated in this paper. After developing a novel technique 
called subtree sampling, we also explore methods of efficiently 
mitigating the threat of sensor node compromises. Performance 
analyses regarding the probability for detecting incomplete 
query-results and communication cost of the proposed schemes 
are also studied. 

I. Introduction 

Tiered Sensor Networks. Sensor networks are expected to 
be deployed on some harsh or hostile regions for data collec- 
tion or environment monitoring. Since there is the possibility 
of no stable connection between the authority and the network, 
in-network storage is necessary for caching or storing the 
data sensed by sensor nodes. A straightforward method is to 
attach external storage to each node, but this is economically 
infeasible. Therefore, various data storage models for sensor 
networks have been studied in the literature. In [6], [22], a 
notion of tiered sensor networks was discussed by introducing 
an intermediate tier between the authority and the sensor 
nodes. The purpose of this tier is to cache the sensed data 
so that the authority can efficiently retrieve the cache data, 
avoiding unnecessary communication with sensor nodes. 

The network model considered in this paper is the same 
as the ones in [6], [22]. More specifically, some storage- 
abundant nodes, called storage nodes, which are equipped with 
several gigabytes of NAND flash storage [24], are deployed 
as the intermediate tier for data archival and query response. 
In practice, some currently available sensor nodes such as 
RISE [21] and StarGate [29] can work as the storage nodes. 
The performance of sensor networks wherein external flash 
memory is attached to the sensor nodes was also studied in 
[15]. In addition, some theoretical issues concerning the tiered 
sensor networks, such as the optimal storage node placement, 
were also studied in [24], [30]. In fact, such a two-tiered 
network architecture has been demonstrated to be useful in 



increasing network capacity and scalability, reducing network 
management complexity, and prolonging network lifetime. 

Multidimensional Queries. Although a large amount of 
sensed data can be stored in storage nodes, the authority 
might be interested in only some portions of them. To this 
end, the authority issues proper queries to retrieve the desired 
portion of sensed data. Note that, when the sensed data have 
multiple attributes, the query could be multidimensional. We 
have observed that range query, top-fc query, and skyline query 
are the most commonly used queries. Range query [12], [17], 
which could be useful for correlating events occurring within 
the network, is used to retrieve sensed data whose attributes 
are individually within a specified range. After mapping the 
sensed data to a ranking value, top-fc query [33], which can be 
used to extract or observe the extreme phenomenon, is used to 
retrieve the sensed data whose ranking values are among the 
first k priority. Skyline query [5], [11], due to its promising 
application in multi-criteria decision making, is also useful and 
important in environment monitoring, industry control, etc. 

Nonetheless, in the tiered network model, the storage nodes 
become the targets that are easily compromised because of 
their significant roles in responding to queries. For example, 
the adversary can eavesdrop on the communications among 
nodes or compromise the storage nodes to obtain the sensed 
data, resulting in the breach of data confidentiality. After the 
compromise of storage nodes, the adversary can also return 
falsified query-results to the authority, leading to the breach 
of query-result authenticity. Even more, the compromised 
storage nodes can cause query-result incompleteness, creating 
an incomplete query-result for the authority by dropping some 
portions of the query-result. 

Related Work. Secure range queries in tiered sensor 
networks have been studied only in [23], [31], [40]. Data 
confidentiality and query-result authenticity can be preserved 
very well in [23], [31], [40] owing to the use of the bucket 
scheme [8], [9]. Unfortunately, encoding approach [23] is only 
suitable for the one-dimensional query scenario in the sensor 
networks for environment monitoring purposes. On the other 
hand, crosscheck approaches [31], [40] can be applied on 
sensor networks for event-driven purposes at the expense of the 
reduced probability for detecting query-result incompleteness. 
The security issues incurred from the compromise of storage 
nodes have been addressed in [23], [31], [40]. The impact of 
collusion attacks defined as the collusion among compromised 
sensor nodes and compromised storage nodes, however, was 
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only discussed in [40], wherein only a naive method was 
proposed as a countermeasure. When the compromised sensor 
nodes are taken into account, a Denial-of-Service attack, called 
false-incrimination attack, not addressed in the literature, can 
be extremely harmful. In such an attack, the compromised 
sensor nodes subvert the functionality of the secure query 
schemes by simply claiming that their sensed data have been 
dropped by the storage nodes. After that, the innocent storage 
nodes will be considered compromised and will be revoked by 
the authority. It should be noted that all the previous solutions 
suffer from false-incrimination attacks. 

Contribution. Our major contributions are: 

• The Secure Range Query (SRQ) scheme is proposed to 
secure the range query in tiered networks (Sec. IHI-Al i. 
By taking advantage of our proposed prime aggregation 
technique for securely transmitting the amount of data 
in specified buckets, SRQ has the lowest communication 
cost among prior works in all scenarios (environment 
monitoring and event detection purposes), while preserv- 
ing the probability for detecting incomplete query-results 
close to 1. It should be noted that although incorporating 
bucket scheme [8], [9] (described in Sec. IIII-AU in the 
protocol design [23], [31], [40] is not new, the novelty 
of our method lies on the use of prime aggregation 
in reducing the overhead and guaranteeing query-result 
completeness. 

• For the first time in the literature, the issues of securing 
top-fc and skyline queries in tiered networks are studied 
(Sees. llH-Bl and El-Cl l. Our solutions to these two issues 
are Secure Top-fc Query (STQ) and Secure Skyline Query 
(SSQ), respectively. The former is built upon the pro- 
posed SRQ scheme to detect query-result completeness, 
while the efficiency of SSQ is based on our proposed 
grouping technique. 

• The security impact of sensor compromises is studied 
(Sec. [V}; collusion attack is formally addressed, and a 
new Denial-of-Service attack, false-incrimination attack, 
which can thwart the security purpose in prior works, 
is first identified in our paper. The resiliency of SRQ, 
STQ, and SSQ against these two attacks is investigated. 
With a novel technique called subtree sampling, some 
minor modifications are introduced for SRQ and STQ 
as countermeasures to these two attacks. Moreover, the 
compromised nodes can even be efficiently identified and 
be further attested [25], [26], [28]. 

II. System Model 

In general, the models used in this paper are very similar 
to those in [6], [22], [23], [31], [40]. 

Network Model. As shown in Fig. Q] the sensor network 
considered in this paper is composed of a large number of 
resource-constrained sensor nodes and a few so-called storage 
nodes. Storage nodes are assumed to be storage-abundant and 
may be compromised. In addition, in certain cases, storage 
nodes could also have abundant resources in energy, computa- 
tion, and communication. The storage nodes can communicate 
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Fig . 1 : A tiered sensor network. 



with the authority via direct or multi-hop communications. The 
network is connected such that, for two arbitrary nodes, at least 
one path connecting them can be found. 

A cell is composed of a storage node and a number of 
sensor nodes. In a cell, sensor nodes could be far away 
from the associated storage node so that they can com- 
municate with each other only through multi-hop commu- 
nication. For example, in Fig. [Tj without the relay of the 
gray node, the black node cannot reach the storage node. 
The nodes in the network 
have synchronized clocks 
[27] and the time is divided 
into epochs. As in [23], 
[24], [31], [40], each node 
is assumed to be aware of 
the geographic position it 
locates [14], [38] so that 
the association between the 
sensor node and storage 
node can be established. As 
a matter of fact, informa- 
tion about the time and ge- 
ographic position is indis- 
pensable for most sensor 
network applications. 

For each cell, aggregation is assumed to be performed 
over an aggregation tree rooted at the storage node. Since 
the optimization of the aggregation tree structure is out of 
the scope of this paper, we adopt the method described in 
TAG [16] to construct an aggregation tree. We follow the 
conventional assumption that the topology of the aggregation 
tree is known by the authority [2], [4]. Similar to sensor nodes, 
storage nodes also perform the sensing task. Each sensor node 
senses the data and temporarily stores the sensed data in its 
local memory within an epoch. At the end of each epoch, 
the sensor nodes in a cell report the sensed data stored in 
local memory to the associated storage node. Throughout this 
paper, we focus on a cell C, composed of N — 1 sensor nodes, 
{ s i}i=i 1 > an d a storage node M.. 

Security Model. We consider the adversary who can com- 
promise an arbitrary number of storage nodes. After node 
compromises, all the information stored in the compromised 
storage nodes will be exposed to the adversary. The goal 
of the adversary is to breach at least one of the following: 
data confidentiality, query-result authenticity, and query-result 
completeness. We temporarily do not consider the compromise 
of sensor nodes in describing SRQ, STQ, and SSQ in Sec. [ill] 
The impact of sensor node compromise on the security breach, 
however, will be explored in Sec. [V] Many security issues 
in sensor networks, such as key management [3], [7], [36], 
broadcast authentication [13], [19], and secure localization 
[14], [38], have been studied in the literature. This paper fo- 
cuses on securing multidimensional queries that are relatively 
unexplored in the literature, while the protocol design of the 
aforementioned issues are beyond the scope of this paper. 

Query Model. The sensed data can be represented as a d- 
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dimensional tuple, (Ai, A 2 , . . . , Ad), where A g , Vg G [l,d], 
denotes the g-th attribute. The authority may issue a proper d- 
dimensional query to retrieve the desired portion of data stored 
in storage nodes. Three types of queries, including range query, 
top-fc query, and skyline query, are considered in this paper. 
For range query, its form, issued by the authority, is expressed 
as (C, t, Zi, hi, . . . , Id, hd), which means that the sensed data to 
be reported to the authority should be generated by the nodes 
in cell C at epoch t, and their g-th attributes, A g 's,, should be 
within the range of [L, h„], g G [1, d]. Top-fc query is usually 
associated with a scalar (linear) ranking function. With ranking 
function, R, the sensed data, even if it is multidimensional, can 
be individually mapped to a one-dimensional ranking value. 
The top-fc query issued by the authority is in the form of 
(C, t, R, k). As the first attempt to achieve secure top-fc query, 
the goal of top-fc query in this paper is simply assumed to 
obtain the sensed data generated by the nodes in cell C at epoch 
t with the first fc smallest ranking values. For skyline query, 
the desired skyline data are defined as those not dominated by 
any other data. Assuming that smaller values are preferable to 
large ones for all attributes, for a set of d-dimensional data, a 
datum Ci dominates another datum Cj if both the conditions, 
A g {ci) < A g ( Cj ), \fg e [l,d], and A g ( Ci ) < A g ( Cj ),3g G 
[l,d], where A g (ci) denotes the 5-th attribute value of the 
datum Ci, hold. Hence, the form of the skyline query issued 
by the authority is given as (C,t), which is used to retrieve 
the skyline data generated in cell C at epoch t. 

III. Securing Multidimensional Queries 

In this section, aiming at securing range query, top-fc query, 
and skyline query, we propose the SRQ (Sec. IIII-At , STQ 
(Sec. IIII-BK and SSQ (Sec. lIII-Cb schemes, respectively. Note 
that though SRQ, STQ, and SSQ use the bucket scheme [8], 
[9], the novelty of them is due to their design in efficiently 
detecting the incomplete query-result (described later). 

A. Securing Range Queries (SRQ) 

Our proposed SRQ scheme consists of a confidentiality- 
preserving reporting phase (Sec. IHI-Alb that can simultane- 
ously prevent the adversary from accessing data stored in 
the storage nodes, authenticate the query results, and ensure 
efficient multidimensional query processing, and a query-result 
completeness verification phase (Sec. lIII-A2l) for guaranteeing 
the completeness of query-results. 

1 ) Confidentiality-preserving reporting: Data encryption is 
a straightforward and common method of ensuring data confi- 
dentiality against a compromised storage node. Moreover, we 
hope that even when the adversary compromises the storage 
node, the previously stored information should not be exposed 
to the adversary. To this end, the keys used in encryption 
should be selected from a one-way hash chain. In particular, 
assume that a key K^q is initially stored in sensor node s;. 
At the beginning of epoch t, the key Kn, which is used only 
within epoch t, is calculated as hash(Ki, t -i), where hash(-) 
is a hash function, and Ki.t-i is dropped. Suppose that sensor 
node Si has sensed data D at epoch t. One method for storing 



D in the storage node M. while preserving the privacy is to 
send {D}K it , which denotes the encryption of D with the 
key Ki :t . With this method, when an OCB-like authenticated 
encryption primitive [20] is exploited, the authenticity of D 
can be guaranteed. At the same time, D will not be known 
by the adversary during message forwarding and even after 
the compromise of the storage node at epoch t because the 
adversary cannot recover the keys used in the time before 
epoch t. Nevertheless, no query can be answered by Ai if 
only encrypted data is stored in M.. Hence, the bucket scheme 
proposed in [8], [9], which uses the encryption keys generated 
via a one-way hash chain, is used in the SRQ scheme. 

In the bucket scheme, the domain of each attribute A g , 
Vg G [l,d\, is assumed to be known in advance, and is 
divided into w g > 1 consecutive non-overlapping intervals 
sequentially indexed from 1 to w g , under a publicly known 
partitioning rule. For ease of representation, in the following, 
we assume that w g = w, Vg G [l,d\. A d-dimensional bucket 
is defined as a tuple, (vi, 1)2, ■ ■ ■ , Vd) (hereafter called bucket 
ID), where v g G [l,u>], g G [l,d]. The sensor node Si, when 
it has sensed data at epoch t, sends to M. the corresponding 
bucket IDs, which are constructed by mapping each attribute 
of the sensed data to the proper interval index, and the sensed 
data encrypted by the key Ki tt - For example, when Si has 
sensed data (1,3), (2,4), and (2, 11) at epoch t, the message 
transmitted to the storage node at the end of epoch t is 
(i,t, (1, 1), {(1,3), (2,4)}*^, (1,2), {(2, ll)}jf M >, assuming 
that A\,A 2 G [1,20], w = 2, and each interval length, set at 
10, is the same. 

Let V be the set of all possible bucket IDs. Assume that 
there are on average Y and Y/N data generated in a cell and 
in a node, respectively, at epoch t. Assume that, Dit,v is a 
set containing all the data within the bucket V G V sensed 
by Si at epoch t. The messages sent from Sj to M. at the 
end of epoch t can be abstracted as (i, t, J CT , {-Di,*,/,,}^ t ), 
where J a G V, J a 7^ J a > , 1 < a, a' < Y/N if there are some 
data sensed by Si within epoch t. Note that Si sends nothing 
to M if A.t.j/s, VJo- G V, are empty. After that, M can 
answer the range query according to the information revealed 
by the bucket IDs. Assume that l g and h g are located within 
the a g -th and /3 9 -th intervals, respectively, where a g < (3 g , a g , 
/3 g G [1, w], and g G [1, d]. The encrypted data falling into the 
buckets in the set A = {(pi, ■ ■ ■ , Pd)\ot g < p g < (3 g ,g G 
are reported to the authority. In other words, once 
receiving the range query, A4 first translates the information 
l\, h\, . . . , Id, hd into the proper bucket IDs and then replies 
all the encrypted data falling into the buckets 1 in A. 

Nevertheless, in tiered sensor networks, even when the 
original bucket scheme is used, M could still maliciously 
drop some encrypted data and only report part of the results 
to the authority, resulting in an incomplete query-result. In 

'There is a tradeoff between the communication cost and confidentiality 
in terms of bucket sizes because larger bucket size implies higher data 
confidentiality and higher communication cost due to more superfluous data 
being returned to the authority. The design of optimal bucketing strategies is 
beyond the scope of this paper, and we refer to [8], [9] for more details. 
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the following, we will describe an extended bucket scheme, 
which incorporates the prime aggregation strategy into the 
original bucket scheme, to detect the incomplete reply in a 
communication-efficient manner. 

2) Query-Result Completeness Verification: With prime 
aggregation technique, SRQ detects an incomplete reply by 
taking advantage of aggregation for counting the amount of 
sensed data falling into specified buckets. Together with a 
hash for verification purpose, the count forms a so-called 
proof in detecting an incomplete reply. The storage node M. 
is required to provide the proof to the authority at the epoch 
specified in the query so that the authority can use the proof 
to verify the completeness of received query-results. Since in 
our design all the sub-proofs generated by the nodes can be 
aggregated to yield the final proof, the communication cost can 
be significantly reduced. The details are described as follows. 

Assume that an aggregation tree [16] has been constructed 
after sensor deployment. Recall that the domain of attribute A g 
is divided into w intervals. Before the sensor deployment, a 
set {pY,Pi\W G [1,N-1],V G V} of (wd+l)(N -1) prime 
numbers is selected by the authority such that p( ^ pY, and 
p\ £ v \, if i -L i< or V ^ V. Then, the set {pY,pf\V G V} of 
wd+1 prime numbers, called the set of bucket primes of Sj, is 
stored in each sensor node s;. In addition, a set {kY , kf \V G 
V} of wd + 1 keys is selected by the authority and is stored 
in each sensor node s; initially. For fixed i and t, the set of 
{kj t , kf \V G V} is called the set of bucket keys of Si at epoch 
t. Bucket primes could be publicly-known, while bucket keys 
should be kept secret. Each sensor node Si, at the beginning of 
epoch t, calculates kj t = hash(kY t _i) and then drops fc< t-i> 
VV G V. In addition, also calculates kf t = hash(kf t _ 1 ) 
and then drops k\ t _ x . 

Recall that each node s, on average has Y/N sensed data 
at epoch t, and assume that the set of Y/N bucket IDs 
associated with these Y/N sensed data is B^t = {v l ' t,<J \a = 
1, . . . , Y/N}, which could be a multiset. Then, according to 
its sensed data, s» calculates Hi t = hash-id t (Yla=i ^it )> 
where hashx{-) denotes the keyed hash function with key 
K, if it has sensed data, and H^t = hashK it (kf t ) oth- 
erwise. Moreover, Sj computes = J^Li pf'*'" if it 
has sensed data, and Pi_ t = pf otherwise. Moreover, once 
receiving (j P ,t,£ jp ,t,B jpt t,H jl ,,t,'Pj p ,t),jp G [1,N-1], Vp G 
[l,x] from its \ children, Sj 1 , . . . , Sj , s, calculates En = 
(Up=i &jp,t) U Ei,t> where En denotes the set of encrypted 
data sensed by s, at epoch t, and Bi.t = ((J*=i Bj p ,t) U Bi,t> 
where Bi t denotes the set of bucket IDs of Ei t- In addition, 
Si also calculates Hn = hashK i , i ,($yf }= i'Hj p ,t + H^ t ) and 
~Pi,t = Il p C =i'Pj P ,t ■ Pi,t, P G [l,x]- Finally, reports 
(i, t, £i ! t,Bi ! t,'Hi,t, Vi t t) to its parent node on the aggregation 
tree. Note that, if Sj is a leaf node on the aggregation tree, 
then we assume that it receives (0, 0, 0, 0, 0, 1). 

Assume that the set {p^,P®m\V G V} of wd + 1 prime 
numbers stored in Ai are all different from those stored in 
sensor nodes, and the set {fc_^ o>^!m ol^ e ^} °^ + ^ 
bucket keys are selected by the authority and stored in A4. M. 



computes k^ t = hash(kj /i t-1 ) and drops k v M t _ 1 at epoch 
t. In addition, M. also computes fe^L t = hash{k® M t _ 1 ) and 
drops k® M t _ 1 at epoch t. For the storage node M., it can also 
calculate E_m,u Bm,u Hm.u and Pjvt,t according to the its 
own sensed data at epoch t. In fact, the procedures M needs to 
perform after messages are received from the child nodes are 
the same as the ones performed by the sensor nodes. Acting 
as the root of the aggregation tree, however, M keeps the 
aggregated results, which are denoted as Smj, BM.t,T~(-M,t 
and Vm,u respectively, in its local storage and waits for the 
query issued by the authority. Note that Pm,* can be thought 
of as a compact summary of the sensed data of the whole 
network and can be very useful for the authority in checking 
the completeness of the query-result, while Hj^ t can be used 
by the authority to verify the authenticity of Pmj- 

Assume that a range query (C, t, li, hi, I2, fi2, ■ ■ ■ , Id, hd) 
is issued by the authority. The encrypted data falling into 
the buckets in the set A, along with the proof composed 
of H.M,t and Vmj, are sent to the authority. Once Vmj 
is received, the authority immediately performs the prime 
factor decomposition of Vm±- Due to the construction of 

{Pi^M'Pi'PM^ G I 1 '^ - l )' V e v l' which guarantees 
that the bucket primes are all distinct, after the prime factor 
decomposition of VM.t, the authority can be aware of which 
node contributes which data within specified buckets. As a 
result, the authority can know which keys should be used to 
verify the authenticity and integrity of H.M.t- More specifi- 
cally, assume that VM,t = (pi) ai ■ ■ • (p7) a "' , ai, . . . , a 7 > 0, 
7 > 0, and that pi, . . . , p 7 are distinct prime numbers. From 
the construction of Pm,u we know that (pj,) a *, for k G [1,7], 
is equal to (p^,') a ' k , for k' G [1,7V - 1] and k" G V. From 
the procedure performed by each node, it can also be known 
that the appearance of (pj) as = {p\> ) a ' k in Vmj means that 
at epoch t the sensor node Sk' produces at data falling into 
bucket k", contributing the bucket key fc£, t in total at times in 
H.M,t- Here, the sensor node producing the data falling into 
the bucket means that sy senses nothing. Thus, we can infer 
the total amount of data falling into specified buckets at epoch 
t. Recall that the authority is aware of the topology of the 
aggregation tree. Thus, after the prime factor decomposition 
of Vm,u the authority can reconstruct JiM.t according to the 
derived at's and pr's by its own effort, because it knows Ki tt 
and kY !t , Vi G {1, . . . , N- 1, M}, Vf > 0, W G V. Therefore, 
we know that the H.M,t reconstructed by the authority is equal 
to the received H.M,t if and only if the received V~M,t are 
considered authentic. When the verification of V~M,t fails, M. 
is considered compromised. When the verification of V~M,t is 
successful, the authority decrypts all the received encryptions, 
and checks whether the number of query-results falling into 
the buckets in A matches those indicated by V~M.t- If and 
only if there are matches in all the buckets in A, the received 
query-results are considered complete. 

B. Securing Top-k Queries (STQ) 

Basically, the proposed STQ scheme for securing top-fc 
query is built upon SRQ in that both confidentiality-preserving 
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reporting and query-result completeness verification phases in 
SRQ are exploited. In particular, based on the proof generated 
in SRQ, since it can know which buckets contain data, the 
authority can also utilize such information to examine the 
completeness of query-results of top-fc query. In other words, 
top-fc query can be secured by the use of the SRQ scheme. 
Because of the similarity between the SRQ and STQ schemes, 
some details of the STQ scheme will be omitted in the 
following description. 

Here, a bucket data set is defined to be composed of 
bucket IDs. We use a d-dimensional tuple, (vi, . . . , Vd), where 
v g € to represent the bucket IDs in a bucket data 

set. With this representation, we can use the ranking func- 
tion, R, to calculate the ranking value of each bucket ID. 
Assume that the v-th interval in the g-th attribute contains 
the values in [u e g ^ujj. The ranking value of the bucket ID, 

( Vl ,...,v d ), is evaluated as fll 11 '' 1 ^" ,..., "^^ ), 

where the e?-dimensional tuple, ( , . . . , d ' Vd+ 2 d '" d ). 

whose individual entry is simply averaged over the minimum 
and maximum values in each interval, acts as the representative 
of the bucket (v\, . . . , vj) for simplicity. 

Recall that we simply assume that the data with the first 
fc smallest ranking values are desired. The general form of 
the message sent from Si to its parent node at the end of 
epoch t is (i,t,£i ! t,Bi !t ,Hi i t,'Pi,t), where £ i>t , Bi. t , TCi.t, 
and Vi.t are the same as those defined in SRQ. Assume that 
Ci , ■ ■ • , Cfc S V are fc bucket IDs in the bucket data set whose 
ranking values are among the first k smallest ones. According 
to Bm.u -M can calculate the ranking values of bucket IDs 
in Bmx and, therefore, knows £1, . . . , To answer a top-fc 
query, (C,t,k), the storage node Ai reports the bucket IDs, 
Ci) • • ■ j Cfc> an d their corresponding encrypted data, along with 
H-M.t and Vm,u to the authority because it can be known that 
the data with the first fc smallest ranking values must be within 
Ci, ■ • • , Cfe- After receiving the query -result, the authority can 
first verify the authenticity of Vi.t by using Hi.t , and verify the 
query -result completeness by using Vi.t- Note that both of the 
above verifications can be performed in a way similar to the 
one described in Sec. IIII-Al Actually, after receiving Vi.t, the 
authority knows which buckets contain data and the amount of 
data. Hence, knowing u e g v and u g v , V<? € [1, d], Vf € [1, w], 
the authority can also obtain d, . . . ,(k- Afterwards, what the 
authority should do is to check if it receives the bucket IDs, 
Ci, ■ • • , Ck> an d if the number of data in bucket ( g /, g' e [1, fc], 
is consistent with the number indicated by Vi.t- If and only 
if these two verifications pass, the authority considers the 
received query-result to be complete and extracts the top-fc 
result from the encrypted data sent from M.. 

C. Securing Skyline Queries (SSQ) 

To support secure skyline query in sensor networks, in the 
following we first present a naive approach as baseline, and 
then propose an advanced approach that employs a grouping 
technique for simultaneously reducing the computation and 
communication cost. 



1 ) Baseline scheme: To ensure the data confidentiality and 
authenticity, as in the SRQ and STQ schemes, the sensed data 
are also encrypted by using the bucket scheme mentioned in 
Sec. IIII- Al I At the end of epoch t, each Si broadcasts its 
sensor ID, all the sensed data encrypted by key Ki.t, and 
the proper bucket IDs to all the nodes within the same cell. 
Then, according to the broadcast messages, each sensor node 
Si at epoch t has a bucket data set composed of the bucket 
IDs extracted from broadcast messages and the bucket IDs 
corresponding to its own sensed data. In fact, the bucket data 
sets constructed by different nodes in the same cell at epoch 
t will be the same. Treating these bucket IDs as data points, 
Si can find the set $ f of skyline buckets that are defined as 
the bucket IDs not dominated by the other bucket IDs. Here, 
since the bucket IDs are represented also by d-dimensional 
tuples, the notion of domination is the same as the one defined 
in the query model of Sec. HI] Define quasi-skyline data as 
the set of data falling into the skyline buckets. It can be 
observed that the set of skyline data must be a subset of quasi- 
skyline data. After doing so, each node can locally find 2 the 
quasi-skyline data, although there could be the cases where 
superfluous data are also included. At the end of epoch t, if 
Ki t is smaller than a pre-determined threshold, then Si sends 
its sensor ID and hashxi t ($t) to M.. Here, hash,K it {^t) 
works as a kind of proof so that it can be used for checking 
the query-result completeness. Note that only hash,K it (^t) 
needs to be transmitted to M. because M. also receives all the 
encrypted data and bucket IDs, and can calculate the quasi- 
skyline data by itself after message broadcasting. 

To answer the skyline query (C, t) the storage node reports 
the quasi-skyline data calculated at epoch t and the hash values 
received at epoch t to the the authority. Since the authority 
knows the threshold and Ki.o Vi € it will expect to 

receive hash values from a set of sensor nodes whose keys are 
smaller than the threshold. Unfortunately, due to the network- 
wide broadcast, this baseline scheme works but is inefficient 
in terms of communication overhead. Hence, an efficient SSQ 
scheme exploiting a grouping strategy is proposed as follows. 

2) Grouping technique: Like the baseline scheme, the 
bucket scheme mentioned in Sec. IIII- Al I is also used. Given 
a data set O and a collection {Oi, • • • , O t } of subsets of O 
satisfying (JJ=i Oj = O and Oj n O f = 0, Vj ^ f for 
t > 1. An observation is that the skyline data of O must be 
a subset of the union of the skyline data of Oj, Vj G [1,t]. 
Thus, the key idea of our proposed SSQ scheme is to partition 
the sensor nodes in a cell into groups so that broadcasting can 
be limited within a group, resulting in reduced computation 
and communication costs. In what follows, the SSQ scheme 
will be described in more detail. 

The sensor nodes in a cell are divided into /i disjoint 
groups, G v , V?7 G each of which is composed of \G V \ 

2 The design of an algorithm for efficiently finding the skyline data given a 
data set is a research topic, but is beyond the scope of this paper. We consider 
the naive data- wise comparison-based algorithm with running time 0(n 2 ) if 
the size of the data set is n, but each node, in fact, can implement an arbitrary 
algorithm for finding skyline data in our setting. 
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sensor nodes. The grouping needs to be performed only once 
right after the sensor deployment. Note that each group is 
formed by nearby sensor nodes and the grouping procedure 
is independent of the structure of the aggregation tree without 
affecting SRQ and STQ. Let cell-region be part of the sensing 
region monitored by one specified cell. For example, with the 
assumption that the shape of each cell-region is approximately 
a square, as shown in Fig. Q] and sensor nodes with uniform 
deployment are considered, the grouping can be achieved by 
simply dividing a cell-region into /i (= vN) sub-cell-regions. 
The sensor nodes in the same sub-cell-region form a group. 
Note that the square cell-region is assumed here for ease of 
explanation, but is not necessary for the grouping procedures 3 . 

At the end of epoch t, each sensor node Si broadcasts its 
sensor ID, its order seed 9 itt — hash,K it {i), all the sensed data 
encrypted with the key K i t , and the proper bucket IDs to all 
the nodes within the same group. After doing so, as in baseline 
scheme, the sensor nodes in the same group can locally find 
the quasi-skyline data from the bucket data set whose entries 
are generated by the sensor nodes in the same group. Let t 
be the set of skyline bucket IDs and 0, ; t their corresponding 
quasi-skyline data encrypted by the sensor nodes in group 77 
using proper keys at epoch t. At the end of epoch t, if Bit 
is among the first £1 smallest ones in the set of order seeds 
in group 77 at epoch t, where £i is a pre-determined threshold 
known by each node, then Si reports & v ,t> <fi v ,t, its verification 
seed hastiKi t ($,,,(), and the IDs of sensor nodes generating 
(f) v> t to Ai. In fact, £i = 1 is sufficient for the verification 
purpose. £i, however, is also related to the resiliency against 
sensor node compromises. Thus, we still keep £i as a variable 
and defer the explanation of the purpose of £i to Sec. [V] 
Here, the purpose of verification seeds is that the completeness 
of quasi-skyline data can be guaranteed by exactly £i sensor 
nodes for each group, while the purpose of order seed is to 
guarantee that at each epoch exactly £i sensor nodes will send 
the verification seeds as the proofs. 

To answer a skyline query, (C, t), the storage node Ai 
reports a hash of all the received verification seeds, h t = 
hashdli^rthashfCi t (&i.t)), where || denotes the bit-string 
concatenation and T t is the set of sensor nodes responsible 
for sending a hash value to Ai in each group at epoch t, the 
set of skyline bucket IDs, and their corresponding encrypted 
data received at epoch t to the authority. Since it knows the 
threshold £ x and K i>t , Vi G {1, . . . , N - 1,M}, the authority 
will expect to receive a particular hash value from Ai. If 
and only if the hash sent from the Ai matches the hash of 
the verification seeds calculated according to the knowledge 
of r t and Ki t by the authority itself, the received data are 
considered complete, and contain the skyline data. 



3 For example, the authority knowing the position of each node or the use of 
clustering algorithms can also divide nodes into groups. In general, after the 
localization, each sensor node can join the proper group possibly according 
to its geographic position when the grouping information such as the sizes of 
sensing region and cell-region are preloaded in sensor nodes. 



IV. Performance Evaluation 

We will focus on analyzing the critical issue of detecting 
an incomplete query-result in tiered networks. In this section, 
the detection probability and communication cost of query- 
result completeness verification in the proposed schemes will 
be analyzed. It is assumed that the number of hops between 
Ai and each sensor node is y/n for a collection of n uniformly 
deployed nodes [1]. In this section, both detection probability 
and communication cost are discussed at a fixed epoch t. 

As the communication cost of encoding approach [23] 
grows exponentially with the number of attributes, and some 
crosscheck approaches [31], [40] have relatively low detection 
probability, in the following, we compare SRQ with only 
hybrid crosscheck [40], which achieves the best balance be- 
tween the detection probability and communication cost in the 
literature. Note that, the parameter setting required in hybrid 
crosscheck is the same as that listed in [40]. 

A. Detection Probability 

The detection probability is defined as the probability that 
the compromised storage node Ai is detected if it returns 
an incomplete query-result. With the fact that the larger the 
portion of query-result Ai drops, higher the probability that 
the authority detects it, we consider the worst case that only 
one bucket and its corresponding data in the query-result are 
dropped by Ai and the number of data sensed by a node is 
either or 1 as the lower bound of detection probability. 

1 ) Detection probability for SRQ and STQ: To return an 
incomplete query-result without being detected by the author- 
ity, Ai should create a proof, {H-M.tiT-'M.t), corresponding 
to the incomplete query-result. Since bucket primes can be 
known by the adversary, VM,t can be easily constructed. 
Nevertheless, H.M,t cannot be constructed, since the bucket 
keys of sensor nodes generating the bucket dropped by Ai are 
not known by the adversary. Therefore, only two options can 
be chosen by the adversary. First, the adversary can directly 
guess to obtain TLm.i, with probability being 2~ ih , where £h 
is the number of bits output by a keyed hash function. This 
implies that the detection probability P^ i is 1 — 2~ lh for 
the first case. Second, knowing the aggregation tree topology, 
the adversary can also follow the rule of SRQ to construct the 
7~(-M,t without considering the bucket key of dropped bucket. 
Assume that the probability that a sensor node has sensed 
data is 5. The size of the bucket key pool can be derived as 
(N - 2) (26 + 1 - S) + 2 = N8 + N - 25. Thus, the probability 
for the adversary to guess successfully is 2~ ik ( NS+N ~ 25 \ 
where £k is the number of bits of a key, leading to the 
detection probability V%™> is 1 - 2- e " ( - NS+N - 25 '> for the 
second case. Overall, the final detection probability, P d( ^ , 
is min{Pf^,P^}. On the other hand, as stated in Sec. 
IIII-B1 the STQ scheme is built upon the SRQ scheme. Thus, 
the detection probability, P^ Q , will be the same as P^ Q . 
As Fig. |2] depicts, the detection probability of SRQ is close 
to 1 in any case. However, hybrid crosscheck is effective only 
when a few sensed data are generated in the network. Such 
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a performance difference can be attributed to the fact that the 
sensed data in the network are securely and deterministically 
summarized in the proof of SRQ but they are probabilistically 
summarized in hybrid crosscheck. 




(b) 

Fig. 2: The detection probability of SRQ and hybrid crosscheck in the cases that (a) 
Y = 100 and (b) Y = 5000. 

2) Detection probability for SSQ: To return an incom- 
plete query-result without being detected by the authority, 
M. should forge a proof, i.e., a hash h t of all the received 
verification seeds, corresponding to the incomplete query- 
result. Therefore, only two options can be chosen by the 
adversary. First, the adversary can directly guess a hash value 
h t . The probability for the adversary to guess successfully is 
2~ lh , implying the detection probability is 1 - 2~ lh 

in this case. Second, the adversary can also follow the rule 
of SRQ to construct ht for incomplete data. In this case, the 
adversary is forced to guess £1 keys for each one of [i groups, 
leading to the probability of success guess being 2~ lkfJ -^ 1 and 
the detection probability being F^ t Q 2 = 1 - 2^ e "^ 1 . The 
final detection probability, Pff t Q , is thus, min{P^,P^}. 
Obviously, P det ^ will also be close to 1 when appropriate key 
length or hash function is selected. 

B. Communication Cost 

The communication cost, T, is defined as the number of bits 
in the communications required for the proposed schemes. We 
are mainly interested in the asymptotic result in terms of d 
and N because they reflect the scalability of the number of 
attributes and the network size, respectively. We do not count 
the number of bits in representing data, £i.t, since the sending 
of £i.t is necessary in any data collection scheme. We further 
assume that there are on average PdtobY data buckets, where 
< Pdtob < 1, generated in cell C. 



1) Communication Cost of SRQ and STQ: Each sensor 
node Si in SRQ is required to send Sj^H^t, and 7\ f to 
its parent node. Nevertheless, Bi t t,l~ti,t, and Vit can be 
aggregated along the path in the aggregation tree. As a con- 
sequence, Si actually has only one-hop broadcast containing 
Bi,t,T~(-i,t, and once at each epoch. In addition, to answer 
a range query, Ai is responsible for sending TLm,u *Pm,u 
and the bucket IDs in A. In summary, the communication 
cost, T SR Q, can be calculated as (N - 1)(4 + t P ) + 
PdtobY \\ogw]d\ogN + 4 + t P + \A\\logw]d = 0(N + 
dlogN), where £p is the number of bits used to represent the 
bucket prime Vm.i- Due to the similarity between SRQ and 
STQ, the communication cost, T ST< 5, can also be calculated 
as 0(N + dlogN) in a way similar to the one for obtaining 

jSRQ 




(b) 

Fig. 3: The communication cost of SRQ and hybrid crosscheck in the cases that (a) 
Y = 100 and (b) Y = 5000. 

As shown in Fig. [3] where the parameters ih = 80 and 
ip = 1000 are used, the communication cost of SRQ 
is significantly lower than that of hybrid crosscheck. More 
specifically, as the communication cost of hybrid crosscheck 
can be asymptotically represented as 0(N 2 + Nd) and will be 
drastically increased with N and d, the proposed SRQ scheme, 
however, exhibits low communication cost regardless of the 
amount of sensed data in the network due to the fact that the 
size of the proof used in SRQ is always a constant. Hence, 
the communication cost of SRQ will be dominated by the 
aggregation procedure, the average hop distance between M. 
and each node, and the transmission of bucket IDs, resulting 
0(N + dlogN) communication cost. 

2) Communication Cost of SSQ: Since grouping is only 
performed once after sensor deployment, we ignore its com- 
munication cost. Note that, in the following, the communica- 
tion cost of the data should be counted because it is involved 
in the design of SSQ. After the grouping, each Sj broadcasts 
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the data bucket IDs, sensor ID, and the order seed to all the 
nodes within the same group. Assuming that the nodes employ 
a duplicate suppression algorithm, by which each node only 
broadcasts a given message once, one node should broadcast 
a message with jsl d + Pdt ^ Y \\ogw~\d + lid + ih bits, where 
id is the average size of a datum and ii d is the number of bits 
required to represent sensor IDs, resulting in communication 
cost of Ci = \G v \ 2 (^i d + 2^\\ogw~}d + i id + 4) bits 
in each group. Let < p q < 1 be the average ratio of 
the quasi-skyline data to all the sensed data. \$, h t\ is equal 
to PdtobPqY on average. Then, once the the order seed is 
among the first £i smallest ones of the order seeds in a group, 
Si £ G n is required for sending &, h t, 4>r],u hashf^ t (& v ,t), 
its sensor ID, and the IDs of sensor nodes generating buckets 
in $ r) t to M., implying the communication cost of Ci = 
PdtobPqY \\ogw~]d+p q Yl d +l h +l ld + \G v \l ld bits in the worst 
case. Note that the verification seeds cannot be aggregated, 
although the verification seed can also be delivered along the 
path to Ai on the aggregation tree. Ai needs to send the 
skyline bucket IDs, its sensor ID, and a hash h t as the proof to 
the authority for answering the query. The communication cost 
of M is C 3 = iid + ih + i"X^=i PdtobPqY \\ogw~\d + p a Yl d . 
Consequently, the upper bound of communication cost, T SS Q, 
can be obtained as \iC\ + jU£iC 2 logiV + C 3 = 0(N% + Nd) 
when /Lt = y/N and \G V \ = y/~N, M-q e By similar 

derivation, the communication cost of the baseline scheme is 
0(N 2 d). Thus, exploiting the proposed grouping technique 
does reduce the required communication cost. The trends of 
communication cost in SSQ are shown in Fig. |4] 




(b) 

Fig. 4: The communication cost of SSQ for (a) Y = 100 and (b) Y = 5000. 

V. Impact of Sensor Node Compromise 

In the previous discussions, we have ignored the impact of 
the compromise of sensor nodes. In practice, the adversary 
could take the control of sensor nodes to enhance the ability 



of performing malicious operations. The notation s is used 
to denote a set of random sensor nodes compromised by 
the adversary. In collusion attack considered here, Ai col- 
ludes with 5 in the hope that more portions of query-results 
generated by innocent sensor nodes can be dropped. Since 
crosscheck approaches [40] suffer from collusion attacks, the 
impact of collusion attack on secure range query for tiered 
sensor networks was addressed in [40]. Their proposed method 
is random probing, by which the authority occasionally checks 
if there is no data sensed by some randomly selected sensor 
nodes by directly communicating with them. Random probing, 
however, can only discover the incomplete query-results with 
an inefficient but possibly lucky way and cannot identify s. 

On the other hand, in this paper, we identify a new Denial- 
of-Service attack never addressed in the literature, called false- 
incrimination attack, by which s provides false sub-proofs to 
the innocent storage node so that the innocent storage node 
will be regarded as the compromised one and be revoked. 
Unfortunately, all the prior works [23], [31], [40] suffer from 
this attack. In summary, with minor modifications involved, 
our proposed SRQ, STQ, and SSQ schemes are resilient 
against both the collusion attack and false-incrimination attack. 

It should be especially noted that, in the following discus- 
sion of SRQ, we temporarily make two unrealistic assumptions 
that the compromised nodes can only disobey the procedures 
of the proposed schemes 4 and the (subtree) proofs (defined 
later) will not be manipulated by the compromised storage 
and sensor nodes, in order to emphasize on the effectiveness 
of our proposed technique in identifying compromised nodes. 
Nevertheless, these two assumptions will be relaxed later. 

Impact of Sensor Node Compromise on SRQ and STQ. 
Under the above two assumptions, the SRQ scheme is in- 
herently resilient against collusion attack, because, regardless 
of the existence and position of s, the proper bucket keys 
will be embedded into the proofs and cannot be removed 
by s. Nevertheless, SRQ could be vulnerable to the false- 
incrimination attack since false sub-proofs injected by s will 
be integrated with the other correct sub-proofs to construct 
a false proof, leading to the revocation of innocent Ai. Here, 
we present a novel technique called subtree sampling enabling 
SRQ, with a slight modification, to efficiently mitigate the 
threat of false-incrimination attacks. The idea of subtree 
sampling is to check if the proof constructed by the nodes 
in a random subtree with fixed depth is authentic so as 
to perform the attestation only on the remaining suspicious 
nodes. Let m be a user-selected constant indicating the 
subtree depth. In the modified SRQ scheme, once receiving 

(j P ^£ 3p ,uB 3p , u n Jp , u r Jp .t^l v ...^l^), Jp g [i,n — 

1], Vp <G [l>x]> from its \ children, Sj li ... i Sj , each Si 
calculates £j ,t, Bj t t, Tij ,t, and Vj ,t as in the original SRQ 
scheme. Note that, if s, is a leaf node on the aggregation 
tree, it is assumed that s, receives (0, 0, 0, 0, 0, 1, 0, . . . , 0). In 
the modified SRQ scheme, however, s, additionally performs 

4 In other words, compromised nodes are assumed to not inject bogus sensor 
readings. They can only manipulate its own subproof and the proofs sent from 
its descendant sensor nodes. 
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the following operations. Assume that Hj t , PJ , £ i)J f , 

Vu G [0,m- 1], Vj p G [1,N - 1], and Vp G [l,*]."^ 
calculates F£ t = hash Ki>t (Y,* =1 HVj + H i>t ) and P£ = 

n*=i P / P 7/ 'X*' Vw e I 1 '™]- also 'calculates i?? t = H i>t 
and PP 4 = Pi t, where Hi :t and Pj, t are computed in a way 
stated in Sec. IIII-AI Then, H" t and P^ t are assigned to set 
i?" t , Vu G [0, m — 1]. If hastiKi t (i) < £2, where £2 is a 
pre-determined threshold known by each node and will be 
analyzed later, then Sj sends $™ to (possibly compromised) 
AL Let T™ be a subtree of the underlying aggregation tree, 
rooted at Sj with depth to. i9™ generated by s, can be thought 
of as the subtree proof of the data sensed by the nodes in X™. 
Finally, s t sends (i, t, £ iit , B ilt , Hi, t ,Vi,t, ■ ■ ■ , ^f 1 ) to 
its parent node. Let Wt be the witness set of sensor nodes 
satisfying hashxi t (i) < £2 at epoch t. The nodes in Wt are 
called witness nodes at epoch t. 




(b) 



Fig. 5: The conceptual diagrams of identifying the compromised nodes, (a) Only the 
nodes in red area need to be attested, (b) Only the nodes in gray area need to be attested. 

We first consider the simplest case, where no compromised 
nodes act as the witness nodes. We further assume that only 
one compromised node (i.e., \s\ = 1) injects false sub-proof 
for simplicity and our method can be adapted to the case of 
multiple compromised nodes injecting false sub-proofs. We 
have the following observation regarding each witness node Sj 
and its generated $™ . Assume that the query -result is found to 
be incomplete at epoch t. The authority requests #™ for each 
witness node s, stored in M.. To identify the compromised 
nodes, all the nodes at first are considered neutral. The nodes 
in T" 1 become innocent if the verification of $™ passes, and 
the nodes in T™ become suspicious otherwise 5 . The above 
verification is performed as follows. According to the P™ in 
the subtree proof $"\, the authority knows which nodes in the 
subtree T™ contribute data and their amount. The authority 

5 Note that there would be the cases that a node is deemed to be both 
innocent and suspicious when different subtree proofs are considered. When 
such a case happens, that node obviously should be innocent. 



can, based on this information, calculate H™\ by itself. Define 
£77™,* as the set of data sensed by the nodes in T™, and $77", t 
as the corresponding bucket IDs of £r m .t- As a consequence, 
the verification passes if and only if the H^t calculated by 
the authority itself is equal to the Hi, t extracted from the 
received received and the amount of the data in £r m .t 
falling into the specified buckets in Bx m .t matches the one 
indicated in p. t . As a whole, each time the above checking 
procedures are performed according to i?™ t at epoch t, nodes 
will be partitioned into three sets, innocent set 3\, neutral 
set, and suspicious set &\ containing innocent nodes, neutral 
nodes, and suspicious nodes, respectively. We can conclude 
that (HieWt 6M0 ®t) LK/M} contains at least one compro- 
mised node injecting the false subproof if at least one & t is 
nonempty and ({s 1: SN-i}\QJ ieWt 3*)) Ul-^l contains 
at least one compromised node injecting the false subproof 
otherwise. In more details, the authority at first only performs 
the attestation [25], [26], [28] on the nodes in HigWt 6V8 ®* 
or in {s l5 . . . , s^v-i} \ (Uiew Nonetheless, after the 
attestation, if the nodes being attested are ensured to be not 
compromised, then M. should be the compromised node. It 
can be observed that the size of the set of the nodes the 
authority needs to perform the attestation has possibility of 
being drastically shrunk so that the computation and commu- 
nication cost required in the attestation will be substantially 
reduced as well. It can also be observed that each time the 
attestation is performed, at least one compromised node can 
be recovered. The intuition behind the checking procedure can 
be illustrated in Fig. [5] In addition, Fig. [6] depicts the number 
of nodes to be attested after an incomplete reply is found 
in different settings. Since the number of witness nodes is 
approximately £&{N — l)/2 £ '\ it can be observed from Fig. [6] 
that the larger the £2 and to, the lower the number of nodes 
to be attested. Nevertheless, when £2 and m become larger, 
the communication cost of the modified SRQ scheme, which 
will be presented later, is increased as well. 

There, however, would still be the cases where the compro- 
mised nodes luckily act as witness nodes so that they can be 
considered innocent by sending genuine subtree proof to M.. 
In our consideration, this case does happen, but our technique 
also successfully mitigates the threat of false-incrimination 
attacks because the effectiveness of false-incrimination attacks 
is now limited within the case where some compromised nodes 
work as witness nodes. 

Now, we have to remove the two unrealistic assumptions we 
made before, allowing that the bogus data can be injected and 
the subtree proof sent from the witness node has possibility 
to be maliciously altered on its way to the authority. After 
the removal of these two unrealistic assumptions, SRQ is 
still resilient against collusion attacks because the use of 
the proofs guarantees that the misbehavior of deleting the 
sensed data will be detected. Unfortunately, on the one hand, 
the compromised nodes injecting the bogus data can deceive 
the authority into accepting the falsified sensor reading. On 
the other hand, the compromised nodes lying on the path 
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(b) 

Fig. 6: The number of nodes to be attested for (a) TV = 500 and (b) N = 1000. 



between M. and witness nodes can manipulate the subtree 
proofs so that the compromised nodes can avoid the detection 
and the innocent M. can still be falsely incriminated. In our 
strategy, we use the redundancy property, which have been 
widely used in the design of the other security protocols in 
WSNs such as en-route filtering [34], [35], [37], [39], to 
mitigate the former threat, while we, motivated by the IP 
traceback technique in internet security literature, develop a 
new traceback technique suitable for WSNs to resist against 
the latter attack. Here, the redundancy property means that 
usually WSNs are densely deployed so that an event in the 
sensing region can be simultaneously detected by multiple 
nodes. In general, the redundancy property can be obtained 
by achieving the so-called i-coverage [10], [18], [32] and 
therefore, an event can be simultaneously observed by at least 
t nodes. In the following description of our remedy to these 
problems, due to its similarity to our modified SRQ scheme 
previously mentioned, we will omit some notational details, 
stressing on the procedures itself. 

When the redundancy property is used, in essence, our 
SRQ does not need to be changed. Assume for now that, the 
authority issues a range query and receives the query-result 
from M. In addition, we also assume that all the checking 
procedures stated in (modified) SRQ are passed. The authority 
now wants to know whether the received data are falsified by 
and sent from the compromised nodes. Recall that each node 
can be aware of its geographic position. After knowing which 
nodes contribute the sensed data to its issued query from the 
query-result, the authority further acquires the encrypted data 
of the neighbors of those nodes from M.. Note that this can 
be achieved because when geographic positions of all nodes 



are known by the authority 6 , inferring the one-hop neighbors 
of a specific node can be easily achieved. Finally, for each 
node in the query-result, the authority checks the consistency 
of its sensor reading and the sensor readings of its one-hop 
neighbors. The sensed data will be rejected as long as it is 
inconsistent with the sensor readings of its neighbors. The 
consistency checking procedure here may allow for certain 
measurement errors or environmental factors, which should 
be domain-specific and user-defined. 

Now, we turn to address another problem that the subtree 
proofs transmitted from the witness node to the authority 
could be maliciously altered by the compromised (storage 
or sensor) nodes. To deal with this kind of threat, we need 
a mechanism, by which the receiver not only can know 
whether the received message is modified by the intermediate 
nodes, but also can point out the node modifying the message 
if it does exist. Motivated by the IP traceback techniques, 
we develop a recursive traceback mechanism. To be more 
specifically, each node Sj on the path connecting the M. and 
the witness node, after receiving D, where D denotes the 
subtree proof, from its descendant node, attaches hashj^i t (D) 
to D and then forwards D\\hashK t t (D) to its ascendant node 
on the underlying aggregation tree. Note that it is assumed that 
the witness node receives D\\%. Thus, with this modification, a 
subtree proof received by the authority should be accompanied 
with ijj hashes, where tp is the number of nodes (including 
storage nodes and the witness node itself) between a specific 
witness node and the authority. Here, we should note that 
because the topology of the aggregation tree is known by 
the authority, when the subtree proof is sent, the IDs of 
intermediate nodes except for the ID of the witness node 
itself do not need to be attached 7 . Hence, when the query- 
result is deemed incomplete, before conducting the procedures 
defined in the subtree sampling technique to attest nodes, 
the authority checks whether the received subtree proof is 
maliciously altered by the intermediate node. In particular, 
assume that the subtree proof and its ip associated hashes, 
D \\hi || • • • \\hE, where hf, 1 < i < ip, is the hash calculated 
by the node that is (i — l)-hop away from the witness node 
and /if is computed by the witness node itself according to 
its asserted subtree proof, are received by the authority. After 
the reception of • • • \\h^, the authority checks the 

consistency of the hash backward; i.e., it first checks hR, and 
then h^_ 1 , and so on. If such a verification can be successfully 
proceeded all ip hashes, then the subtree proof D is considered 
to be intact. Otherwise, once the verification fails in, say, h®, 
we can conclude that the subtree proof was altered by the 
corresponding node since the innocent node is not assumed to 
behave in such a way. 

6 As long as each node knows its position, it sends in a multihop manner 
its position to the authority with the MAC constructed by the key uniquely 
shared with the authority. This kind of operations are only performed once 
after the sensor deployment. 

7 The path from any sensor node in the aggregation tree to Ai is unique. 
The authority can infer the nodes the subtree proof traverses once it is aware 
of the ID of the witness node. 
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Recall that the communication cost of original SRQ is 
0(N + d log TV). In the modified SRQ without those two 
unrealistic assumptions, each node Sj at epoch t is required 
to additionally send #° t , . . . , #™ t -1 , leading to O(iV) com- 
munication cost. At epoch t, approximately £ 2 (N — l)/2 h 
witness nodes will send their subtree proofs to M.. Never- 
theless, when the subtree proofs are sent to Ai, since the 
additional hashes will be added, its communication cost will 

be 0(N ■ (VN + (y/N - 1) + h 1)). As a result, the 

communication cost becomes 0(N 2 + dlogN). 

Basically, STQ can be regarded as a special use of SRQ. 
Thus, its resiliency against false-incrimination attack is the 
same as that of SRQ. Nevertheless, due to the nature of top- 
k query, the injection of the falsified sensor reading from 
the compromised nodes will imply a significant query-result 
deviation. Nevertheless, due to the use of the redundancy 
property in resisting against false data injection, STQ is 
resilient against the collusion attacks and false-incrimination 
attacks even when the compromised nodes can inject bogus 
data and can manipulate the proofs. Finally, because of its 
similarity to the SRQ scheme, STQ has the communication 
overhead the same as SRQ's. 

Impact of Sensor Node Compromise on SSQ. Recall 
that two aforementioned unrealistic assumptions are made. We 
first consider the resiliency against false-incrimination attacks. 
The simplest method is to introduce a parameter £3 < £1 
so that M. reports to the authority all the verification seeds, 
instead of the hash of them in original SSQ. After that, for 
each group, if at least £3 out of £1 verification seeds can 
be successfully verified, then the quasi-skyline data in that 
group are considered complete. Hence, the threat of false- 
incrimination attacks will be mitigated because the adversary 
is forced to send at least £1 — £3 + 1 false proofs, instead of 
single one false proof. With even one verification seed from 
a specific group failed to be verified, we can conclude that at 
least one compromised sensor node exists in that group. 

Now, we consider collusion attacks. In SSQ, both M. 
and sensor nodes only know the quasi-skyline data. Recall 
that quasi-skyline data are not necessarily the skyline data. 
Thus, even when there is more than one compromised sensor 
node in a group, the probability of successfully dropping the 
skyline data is actually small. More specifically, to drop the 
skyline data at a specified epoch, s should contain at least £3 
sensor nodes responsible for sending hash values in order to 
successfully forge a proof of incomplete quasi-skyline data, 
and at the same time should be fortunate enough to select 
groups whose quasi-skyline data contain skyline data 8 . 

Now, we consider both the collusion and false-incrimination 
attacks. To drop the skyline data, the only thing Ai can do 
is to drop the data of certain groups. Here, for simplicity, we 
consider the case where Ai drops the data of a fixed group G v , 
r) £ [1, n], in which a set s of x sensor nodes is compromised. 
To prevent the detection of incomplete query-result, £3 out of x 

8 Definitely, M can simply drop all the sensed data. Nevertheless, under 
this option, it is forced to forge at least — §3 + 1) proofs (/i = \fW in 
our analysis), leading to high probability of being detected. 



compromised sensor nodes should be the sensor nodes respon- 
sible for sending the proofs. The probability that at least £3 out 
of £1 nodes responsible for sending the proofs are contained 

in 5 is Pl = Yf£=b (^('^i^'VC^')- In other words ' 
this is equal to the probability that the compromised sensor 
nodes can drop the quasi-skyline data without being detected. 
Quasi-skyline data, however, are not necessarily equivalent to 
the skyline data. The probability that the quasi-skyline data 
dropped by compromised nodes indeed contain skyline data 
can be represented as p 2 = L Q ]y/n- Pc y) I {\gJy/n)' where 
p c is the average ratio of the skyline data to all the sensed data. 
In short, this is equal to the probability that the operations per- 
formed by compromised nodes cause the loss of skyline data. 
As a whole, even if the adversary has x compromised nodes in 
Gjj, the probability of successfully making skyline query -result 
incomplete is merely pi ■ p 2 . The trends of such a probability 
are depicted in Fig. Q under different parameter settings. As 
shown in Fig. [7a] pi ■ p 2 is decreased with an increase of 
N and Y. This is because when N and Y become larger, it 
is more unlikely that the quasi-skyline data dropped by the 
adversary contains the skyline data. Nevertheless, as shown in 
Fig- EH Pi -p-2 is increased with an increase of (£1 —£3). This 
is because when (£1 — £3) becomes larger, it is more likely 
that s can forge a proof of an incomplete quasi-skyline data. 
As a whole, from the false-incrimination attack point of view, 
the larger the (£1 — £3), the lower the p\ ■ p 2 , but from the 
collusion attack point of view, the smaller the (£1 — £3), the 
lower the pi -p 2 . This would be an optimization problem that 
deserves further studying. In addition, compared with original 
SSQ, the additional communication cost incurred from the 
modified SSQ comes from the transmission of verification 
seeds from Ai to the authority. Thus, the communication cost 
of the modified SSQ scheme remains Q(Ni +Nd). 




(b) 

Fig. 7: The probability of successfully dropping skyline data in the cases that (a) 
£1=8 and £3 = 4, and (b) N = 500 and Y = 100. 
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Now, the two unrealistic assumptions will be relaxed so that 
the compromised nodes can provide falsified sensor readings 
and contaminate the proofs. The same as the top-fc query, 
skyline query is vulnerable to the falsified data. That is, 
a compromised node injecting an falsified extreme sensor 
reading can gain the effect of deleting all the data sensed by 
the other sensor nodes. Thus, the redundancy property is sill 
required to be applied on our SSQ scheme so as to detect 
the false data injection. Because the use of the redundancy 
property in SSQ is also similar to its use in SRQ, we omit 
the detailed description as well. In addition, the adversary 
may compromise the sensor nodes near the innocent M. so 
that it can falsify all the verification seeds. Therefore, our 
recursive traceback mechanism is also required to be applied 
on SSQ to secure the verification seeds. Because of these two 
additional changes in SSQ, the communication cost becomes 
0{N 2 +Nd). 

VI. Conclusion 

We propose schemes for securing range query, top-fc query, 
and skyline query, respectively. Two critical performance 
metrics, detection probability and communication cost, are 
analyzed. In particular, the performance of SRQ is superior 
to all the prior works, while STQ and SSQ act as the 
first proposals for securing top-fc query and skyline query, 
respectively, in tiered sensor networks. We also investigate the 
security impact of collusion attacks and newly identified false- 
information attacks, and explore the resiliency of the proposed 
schemes against these two attacks. 
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Abstract — In this paper, aiming at securing range query, top-k 
query, and skyline query in tiered sensor networks, we propose 
the Secure Range Query (SRQ), Secure Top-fc Query (STQ), and 
Secure Skyline Query (SSQ) schemes, respectively. In particular, 
SRQ, by using our proposed prime aggregation technique, has 
the lowest communication overhead among prior works, while 
STQ and SSQ, to our knowledge, are the first proposals in 
tiered sensor networks for securing top-fc and skyline queries, 
respectively. Moreover, the relatively unexplored issue of the 
security impact of sensor node compromises on multidimensional 
queries is studied; two attacks incurred from the sensor node 
compromises, collusion attack and false-incrimination attack, are 
investigated in this paper. After developing a novel technique 
called subtree sampling, we also explore methods of efficiently 
mitigating the threat of sensor node compromises. Performance 
analyses regarding the probability for detecting incomplete 
query-results and communication cost of the proposed schemes 
are also studied. 

I. Introduction 

Tiered Sensor Networks. Sensor networks are expected to 
be deployed on some harsh or hostile regions for data collec- 
tion or environment monitoring. Since there is the possibility 
of no stable connection between the authority and the network, 
in-network storage is necessary for caching or storing the 
data sensed by sensor nodes. A straightforward method is to 
attach external storage to each node, but this is economically 
infeasible. Therefore, various data storage models for sensor 
networks have been studied in the literature. In [6], [20], a 
notion of tiered sensor networks was discussed by introducing 
an intermediate tier between the authority and the sensor 
nodes. The purpose of this tier is to cache the sensed data 
so that the authority can efficiently retrieve the cache data, 
avoiding unnecessary communication with sensor nodes. 

The network model considered in this paper is the same 
as the ones in [6], [20]. More specifically, some storage- 
abundant nodes, called storage nodes, which are equipped with 
several gigabytes of NAND flash storage [22], are deployed 
as the intermediate tier for data archival and query response. 
In practice, some currently available sensor nodes such as 
RISE [19] and StarGate [27] can work as the storage nodes. 
The performance of sensor networks wherein external flash 
memory is attached to the sensor nodes was also studied in 
[14]. In addition, some theoretical issues concerning the tiered 
sensor networks, such as the optimal storage node placement, 
were also studied in [22], [28]. In fact, such a two-tiered 
network architecture has been demonstrated to be useful in 



increasing network capacity and scalability, reducing network 
management complexity, and prolonging network lifetime. 

Multidimensional Queries. Although a large amount of 
sensed data can be stored in storage nodes, the authority 
might be interested in only some portions of them. To this 
end, the authority issues proper queries to retrieve the desired 
portion of sensed data. Note that, when the sensed data have 
multiple attributes, the query could be multidimensional. We 
have observed that range query, top-fc query, and skyline query 
are the most commonly used queries. Range query [11], [16], 
which could be useful for correlating events occurring within 
the network, is used to retrieve sensed data whose attributes 
are individually within a specified range. After mapping the 
sensed data to a ranking value, top-fc query [30], which can be 
used to extract or observe the extreme phenomenon, is used to 
retrieve the sensed data whose ranking values are among the 
first k priority. Skyline query [5], [10], due to its promising 
application in multi-criteria decision making, is also useful and 
important in environment monitoring, industry control, etc. 

Nonetheless, in the tiered network model, the storage nodes 
become the targets that are easily compromised because of 
their significant roles in responding to queries. For example, 
the adversary can eavesdrop on the communications among 
nodes or compromise the storage nodes to obtain the sensed 
data, resulting in the breach of data confidentiality. After the 
compromise of storage nodes, the adversary can also return 
falsified query-results to the authority, leading to the breach 
of query-result authenticity. Even more, the compromised 
storage nodes can cause query-result incompleteness, creating 
an incomplete query-result for the authority by dropping some 
portions of the query-result. 

Related Work. Secure range queries in tiered sensor 
networks have been studied only in [21], [29], [34]. Data 
confidentiality and query-result authenticity can be preserved 
very well in [21], [29], [34] owing to the use of the bucket 
scheme [8], [9]. Unfortunately, encoding approach [21] is only 
suitable for the one-dimensional query scenario in the sensor 
networks for environment monitoring purposes. On the other 
hand, crosscheck approaches [29], [34] can be applied on 
sensor networks for event-driven purposes at the expense of the 
reduced probability for detecting query-result incompleteness. 
The security issues incurred from the compromise of storage 
nodes have been addressed in [21], [29], [34]. The impact of 
collusion attacks defined as the collusion among compromised 
sensor nodes and compromised storage nodes, however, was 
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only discussed in [34], wherein only a naive method was 
proposed as a countermeasure. When the compromised sensor 
nodes are taken into account, a Denial-of-Service attack, called 
false-incrimination attack, not addressed in the literature, can 
be extremely harmful. In such an attack, the compromised 
sensor nodes subvert the functionality of the secure query 
schemes by simply claiming that their sensed data have been 
dropped by the storage nodes. After that, the innocent storage 
nodes will be considered compromised and will be revoked by 
the authority. It should be noted that all the previous solutions 
suffer from false-incrimination attacks. 

Contribution. Our major contributions are: 

• The Secure Range Query (SRQ) scheme is proposed to 
secure the range query in tiered networks (Sec. IHI-Al i. 
By taking advantage of our proposed prime aggregation 
technique for securely transmitting the amount of data 
in specified buckets, SRQ has the lowest communication 
cost among prior works in all scenarios (environment 
monitoring and event detection purposes), while preserv- 
ing the probability for detecting incomplete query-results 
close to 1. It should be noted that although incorporating 
bucket scheme [8], [9] (described in Sec. IIII-AU in the 
protocol design [21], [29], [34] is not new, the novelty 
of our method lies on the use of prime aggregation 
in reducing the overhead and guaranteeing query-result 
completeness. 

• For the first time in the literature, the issues of securing 
top-fc and skyline queries in tiered networks are studied 
(Sees. llH-Bl and El-Cl l. Our solutions to these two issues 
are Secure Top-fc Query (STQ) and Secure Skyline Query 
(SSQ), respectively. The former is built upon the pro- 
posed SRQ scheme to detect query-result completeness, 
while the efficiency of SSQ is based on our proposed 
grouping technique. 

• The security impact of sensor compromises is studied 
(Sec. [V}; collusion attack is formally addressed, and a 
new Denial-of-Service attack, false-incrimination attack, 
which can thwart the security purpose in prior works, 
is first identified in our paper. The resiliency of SRQ, 
STQ, and SSQ against these two attacks is investigated. 
With a novel technique called subtree sampling, some 
minor modifications are introduced for SRQ and STQ 
as countermeasures to these two attacks. Moreover, the 
compromised nodes can even be efficiently identified and 
be further attested [23], [24], [26]. 

II. System Model 

In general, the models used in this paper are very similar 
to those in [6], [20], [21], [29], [34]. 

Network Model. As shown in Fig. Q] the sensor network 
considered in this paper is composed of a large number of 
resource-constrained sensor nodes and a few so-called storage 
nodes. Storage nodes are assumed to be storage-abundant and 
may be compromised. In addition, in certain cases, storage 
nodes could also have abundant resources in energy, computa- 
tion, and communication. The storage nodes can communicate 
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Fig . 1 : A tiered sensor network. 



with the authority via direct or multi-hop communications. The 
network is connected such that, for two arbitrary nodes, at least 
one path connecting them can be found. 

A cell is composed of a storage node and a number of 
sensor nodes. In a cell, sensor nodes could be far away 
from the associated storage node so that they can com- 
municate with each other only through multi-hop commu- 
nication. For example, in Fig. [Tj without the relay of the 
gray node, the black node cannot reach the storage node. 
The nodes in the network 
have synchronized clocks 
[25] and the time is divided 
into epochs. As in [21], 
[22], [29], [34], each node 
is assumed to be aware of 
the geographic position it 
locates [13], [33] so that 
the association between the 
sensor node and storage 
node can be established. As 
a matter of fact, informa- 
tion about the time and ge- 
ographic position is indis- 
pensable for most sensor 
network applications. 

For each cell, aggregation is assumed to be performed 
over an aggregation tree rooted at the storage node. Since 
the optimization of the aggregation tree structure is out of 
the scope of this paper, we adopt the method described in 
TAG [15] to construct an aggregation tree. We follow the 
conventional assumption that the topology of the aggregation 
tree is known by the authority [2], [4]. Similar to sensor nodes, 
storage nodes also perform the sensing task. Each sensor node 
senses the data and temporarily stores the sensed data in its 
local memory within an epoch. At the end of each epoch, 
the sensor nodes in a cell report the sensed data stored in 
local memory to the associated storage node. Throughout this 
paper, we focus on a cell C, composed of N — 1 sensor nodes, 
{ s i}i=i 1 > an d a storage node M.. 

Security Model. We consider the adversary who can com- 
promise an arbitrary number of storage nodes. After node 
compromises, all the information stored in the compromised 
storage nodes will be exposed to the adversary. The goal 
of the adversary is to breach at least one of the following: 
data confidentiality, query-result authenticity, and query-result 
completeness. We temporarily do not consider the compromise 
of sensor nodes in describing SRQ, STQ, and SSQ in Sec. [ill] 
The impact of sensor node compromise on the security breach, 
however, will be explored in Sec. [V] Many security issues 
in sensor networks, such as key management [3], [7], [31], 
broadcast authentication [12], [17], and secure localization 
[13], [33], have been studied in the literature. This paper fo- 
cuses on securing multidimensional queries that are relatively 
unexplored in the literature, while the protocol design of the 
aforementioned issues are beyond the scope of this paper. 

Query Model. The sensed data can be represented as a d- 
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dimensional tuple, (Ai, A 2 , . . . , Ad), where A g , Vg G [l,d], 
denotes the g-th attribute. The authority may issue a proper d- 
dimensional query to retrieve the desired portion of data stored 
in storage nodes. Three types of queries, including range query, 
top-fc query, and skyline query, are considered in this paper. 
For range query, its form, issued by the authority, is expressed 
as (C, t, Zi, hi, . . . , Id, hd), which means that the sensed data to 
be reported to the authority should be generated by the nodes 
in cell C at epoch t, and their g-th attributes, A g 's,, should be 
within the range of [L, h„], g G [1, d]. Top-fc query is usually 
associated with a scalar (linear) ranking function. With ranking 
function, R, the sensed data, even if it is multidimensional, can 
be individually mapped to a one-dimensional ranking value. 
The top-fc query issued by the authority is in the form of 
(C, t, R, k). As the first attempt to achieve secure top-fc query, 
the goal of top-fc query in this paper is simply assumed to 
obtain the sensed data generated by the nodes in cell C at epoch 
t with the first fc smallest ranking values. For skyline query, 
the desired skyline data are defined as those not dominated by 
any other data. Assuming that smaller values are preferable to 
large ones for all attributes, for a set of d-dimensional data, a 
datum Ci dominates another datum Cj if both the conditions, 
A g {ci) < A g ( Cj ), \fg e [l,d], and A g ( Ci ) < A g ( Cj ),3g G 
[l,d], where A g (ci) denotes the 5-th attribute value of the 
datum Ci, hold. Hence, the form of the skyline query issued 
by the authority is given as (C,t), which is used to retrieve 
the skyline data generated in cell C at epoch t. 

III. Securing Multidimensional Queries 

In this section, aiming at securing range query, top-fc query, 
and skyline query, we propose the SRQ (Sec. IIII-At , STQ 
(Sec. IIII-BK and SSQ (Sec. lIII-Cb schemes, respectively. Note 
that though SRQ, STQ, and SSQ use the bucket scheme [8], 
[9], the novelty of them is due to their design in efficiently 
detecting the incomplete query-result (described later). 

A. Securing Range Queries (SRQ) 

Our proposed SRQ scheme consists of a confidentiality- 
preserving reporting phase (Sec. IHI-Alb that can simultane- 
ously prevent the adversary from accessing data stored in 
the storage nodes, authenticate the query results, and ensure 
efficient multidimensional query processing, and a query-result 
completeness verification phase (Sec. lIII-A2l) for guaranteeing 
the completeness of query-results. 

1 ) Confidentiality-preserving reporting: Data encryption is 
a straightforward and common method of ensuring data confi- 
dentiality against a compromised storage node. Moreover, we 
hope that even when the adversary compromises the storage 
node, the previously stored information should not be exposed 
to the adversary. To this end, the keys used in encryption 
should be selected from a one-way hash chain. In particular, 
assume that a key K^q is initially stored in sensor node s;. 
At the beginning of epoch t, the key Kn, which is used only 
within epoch t, is calculated as hash(Ki, t -i), where hash(-) 
is a hash function, and Ki.t-i is dropped. Suppose that sensor 
node Si has sensed data D at epoch t. One method for storing 



D in the storage node M. while preserving the privacy is to 
send {D}K it , which denotes the encryption of D with the 
key Ki :t . With this method, when an OCB-like authenticated 
encryption primitive [18] is exploited, the authenticity of D 
can be guaranteed. At the same time, D will not be known 
by the adversary during message forwarding and even after 
the compromise of the storage node at epoch t because the 
adversary cannot recover the keys used in the time before 
epoch t. Nevertheless, no query can be answered by Ai if 
only encrypted data is stored in M.. Hence, the bucket scheme 
proposed in [8], [9], which uses the encryption keys generated 
via a one-way hash chain, is used in the SRQ scheme. 

In the bucket scheme, the domain of each attribute A g , 
Vg G [l,d\, is assumed to be known in advance, and is 
divided into w g > 1 consecutive non-overlapping intervals 
sequentially indexed from 1 to w g , under a publicly known 
partitioning rule. For ease of representation, in the following, 
we assume that w g = w, Vg G [l,d\. A d-dimensional bucket 
is defined as a tuple, (vi, 1)2, ■ ■ ■ , Vd) (hereafter called bucket 
ID), where v g G [l,u>], g G [l,d]. The sensor node Si, when 
it has sensed data at epoch t, sends to M. the corresponding 
bucket IDs, which are constructed by mapping each attribute 
of the sensed data to the proper interval index, and the sensed 
data encrypted by the key Ki tt - For example, when Si has 
sensed data (1,3), (2,4), and (2, 11) at epoch t, the message 
transmitted to the storage node at the end of epoch t is 
(i,t, (1, 1), {(1,3), (2,4)}*^, (1,2), {(2, ll)}jf M >, assuming 
that A\,A 2 G [1,20], w = 2, and each interval length, set at 
10, is the same. 

Let V be the set of all possible bucket IDs. Assume that 
there are on average Y and Y/N data generated in a cell and 
in a node, respectively, at epoch t. Assume that, Dit,v is a 
set containing all the data within the bucket V G V sensed 
by Si at epoch t. The messages sent from Sj to M. at the 
end of epoch t can be abstracted as (i, t, J CT , {-Di,*,/,,}^ t ), 
where J a G V, J a 7^ J a > , 1 < a, a' < Y/N if there are some 
data sensed by Si within epoch t. Note that Si sends nothing 
to M if A.t.j/s, VJo- G V, are empty. After that, M can 
answer the range query according to the information revealed 
by the bucket IDs. Assume that l g and h g are located within 
the a g -th and /3 9 -th intervals, respectively, where a g < (3 g , a g , 
/3 g G [1, w], and g G [1, d]. The encrypted data falling into the 
buckets in the set A = {(pi, ■ ■ ■ , Pd)\ot g < p g < (3 g ,g G 
are reported to the authority. In other words, once 
receiving the range query, A4 first translates the information 
l\, h\, . . . , Id, hd into the proper bucket IDs and then replies 
all the encrypted data falling into the buckets 1 in A. 

Nevertheless, in tiered sensor networks, even when the 
original bucket scheme is used, M could still maliciously 
drop some encrypted data and only report part of the results 
to the authority, resulting in an incomplete query-result. In 

'There is a tradeoff between the communication cost and confidentiality 
in terms of bucket sizes because larger bucket size implies higher data 
confidentiality and higher communication cost due to more superfluous data 
being returned to the authority. The design of optimal bucketing strategies is 
beyond the scope of this paper, and we refer to [8], [9] for more details. 
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the following, we will describe an extended bucket scheme, 
which incorporates the prime aggregation strategy into the 
original bucket scheme, to detect the incomplete reply in a 
communication-efficient manner. 

2) Query-Result Completeness Verification: With prime 
aggregation technique, SRQ detects an incomplete reply by 
taking advantage of aggregation for counting the amount of 
sensed data falling into specified buckets. Together with a 
hash for verification purpose, the count forms a so-called 
proof in detecting an incomplete reply. The storage node M. 
is required to provide the proof to the authority at the epoch 
specified in the query so that the authority can use the proof 
to verify the completeness of received query-results. Since in 
our design all the sub-proofs generated by the nodes can be 
aggregated to yield the final proof, the communication cost can 
be significantly reduced. The details are described as follows. 

Assume that an aggregation tree [15] has been constructed 
after sensor deployment. Recall that the domain of attribute A g 
is divided into w intervals. Before the sensor deployment, a 
set {pY,Pi\W G [1,N-1],V G V} of (wd+l)(N -1) prime 
numbers is selected by the authority such that p( ^ pY, and 
p\ £ v \, if i -L i< or V ^ V. Then, the set {pY,pf\V G V} of 
wd+1 prime numbers, called the set of bucket primes of Sj, is 
stored in each sensor node s;. In addition, a set {kY , kf \V G 
V} of wd + 1 keys is selected by the authority and is stored 
in each sensor node s; initially. For fixed i and t, the set of 
{kj t , kf \V G V} is called the set of bucket keys of Si at epoch 
t. Bucket primes could be publicly-known, while bucket keys 
should be kept secret. Each sensor node Si, at the beginning of 
epoch t, calculates kj t = hash(kY t _i) and then drops fc< t-i> 
VV G V. In addition, also calculates kf t = hash(kf t _ 1 ) 
and then drops k\ t _ x . 

Recall that each node s, on average has Y/N sensed data 
at epoch t, and assume that the set of Y/N bucket IDs 
associated with these Y/N sensed data is B^t = {v l ' t,<J \a = 
1, . . . , Y/N}, which could be a multiset. Then, according to 
its sensed data, s» calculates Hi t = hash-id t (Yla=i ^it )> 
where hashx{-) denotes the keyed hash function with key 
K, if it has sensed data, and H^t = hashK it (kf t ) oth- 
erwise. Moreover, Sj computes = J^Li pf'*'" if it 
has sensed data, and Pi_ t = pf otherwise. Moreover, once 
receiving (j P ,t,£ jp ,t,B jpt t,H jl ,,t,'Pj p ,t),jp G [1,N-1], Vp G 
[l,x] from its \ children, Sj 1 , . . . , Sj , s, calculates En = 
(Up=i &jp,t) U Ei,t> where En denotes the set of encrypted 
data sensed by s, at epoch t, and Bi.t = ((J*=i Bj p ,t) U Bi,t> 
where Bi t denotes the set of bucket IDs of Ei t- In addition, 
Si also calculates Hn = hashK i , i ,($yf }= i'Hj p ,t + H^ t ) and 
~Pi,t = Il p C =i'Pj P ,t ■ Pi,t, P G [l,x]- Finally, reports 
(i, t, £i ! t,Bi ! t,'Hi,t, Vi t t) to its parent node on the aggregation 
tree. Note that, if Sj is a leaf node on the aggregation tree, 
then we assume that it receives (0, 0, 0, 0, 0, 1). 

Assume that the set {p^,P®m\V G V} of wd + 1 prime 
numbers stored in Ai are all different from those stored in 
sensor nodes, and the set {fc_^ o>^!m ol^ e ^} °^ + ^ 
bucket keys are selected by the authority and stored in A4. M. 



computes k^ t = hash(kj /i t-1 ) and drops k v M t _ 1 at epoch 
t. In addition, M. also computes fe^L t = hash{k® M t _ 1 ) and 
drops k® M t _ 1 at epoch t. For the storage node M., it can also 
calculate E_m,u Bm,u Hm.u and Pjvt,t according to the its 
own sensed data at epoch t. In fact, the procedures M needs to 
perform after messages are received from the child nodes are 
the same as the ones performed by the sensor nodes. Acting 
as the root of the aggregation tree, however, M keeps the 
aggregated results, which are denoted as Smj, BM.t,T~(-M,t 
and Vm,u respectively, in its local storage and waits for the 
query issued by the authority. Note that Pm,* can be thought 
of as a compact summary of the sensed data of the whole 
network and can be very useful for the authority in checking 
the completeness of the query-result, while Hj^ t can be used 
by the authority to verify the authenticity of Pmj- 

Assume that a range query (C, t, li, hi, I2, fi2, ■ ■ ■ , Id, hd) 
is issued by the authority. The encrypted data falling into 
the buckets in the set A, along with the proof composed 
of H.M,t and Vmj, are sent to the authority. Once Vmj 
is received, the authority immediately performs the prime 
factor decomposition of Vm±- Due to the construction of 

{Pi^M'Pi'PM^ G I 1 '^ - l )' V e v l' which guarantees 
that the bucket primes are all distinct, after the prime factor 
decomposition of VM.t, the authority can be aware of which 
node contributes which data within specified buckets. As a 
result, the authority can know which keys should be used to 
verify the authenticity and integrity of H.M.t- More specifi- 
cally, assume that VM,t = (pi) ai ■ ■ • (p7) a "' , ai, . . . , a 7 > 0, 
7 > 0, and that pi, . . . , p 7 are distinct prime numbers. From 
the construction of Pm,u we know that (pj,) a *, for k G [1,7], 
is equal to (p^,') a ' k , for k' G [1,7V - 1] and k" G V. From 
the procedure performed by each node, it can also be known 
that the appearance of (pj) as = {p\> ) a ' k in Vmj means that 
at epoch t the sensor node Sk' produces at data falling into 
bucket k", contributing the bucket key fc£, t in total at times in 
H.M,t- Here, the sensor node producing the data falling into 
the bucket means that sy senses nothing. Thus, we can infer 
the total amount of data falling into specified buckets at epoch 
t. Recall that the authority is aware of the topology of the 
aggregation tree. Thus, after the prime factor decomposition 
of Vm,u the authority can reconstruct JiM.t according to the 
derived at's and pr's by its own effort, because it knows Ki tt 
and kY !t , Vi G {1, . . . , N- 1, M}, Vf > 0, W G V. Therefore, 
we know that the H.M,t reconstructed by the authority is equal 
to the received H.M,t if and only if the received V~M,t are 
considered authentic. When the verification of V~M,t fails, M. 
is considered compromised. When the verification of V~M,t is 
successful, the authority decrypts all the received encryptions, 
and checks whether the number of query-results falling into 
the buckets in A matches those indicated by V~M.t- If and 
only if there are matches in all the buckets in A, the received 
query-results are considered complete. 

B. Securing Top-k Queries (STQ) 

Basically, the proposed STQ scheme for securing top-fc 
query is built upon SRQ in that both confidentiality-preserving 
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reporting and query-result completeness verification phases in 
SRQ are exploited. In particular, based on the proof generated 
in SRQ, since it can know which buckets contain data, the 
authority can also utilize such information to examine the 
completeness of query-results of top-fc query. In other words, 
top-fc query can be secured by the use of the SRQ scheme. 
Because of the similarity between the SRQ and STQ schemes, 
some details of the STQ scheme will be omitted in the 
following description. 

Here, a bucket data set is defined to be composed of 
bucket IDs. We use a d-dimensional tuple, (vi, . . . , Vd), where 
v g € to represent the bucket IDs in a bucket data 

set. With this representation, we can use the ranking func- 
tion, R, to calculate the ranking value of each bucket ID. 
Assume that the v-th interval in the g-th attribute contains 
the values in [u e g ^ujj. The ranking value of the bucket ID, 

( Vl ,...,v d ), is evaluated as fll 11 '' 1 ^" ,..., "^^ ), 

where the e?-dimensional tuple, ( , . . . , d ' Vd+ 2 d '" d ). 

whose individual entry is simply averaged over the minimum 
and maximum values in each interval, acts as the representative 
of the bucket (v\, . . . , vj) for simplicity. 

Recall that we simply assume that the data with the first 
fc smallest ranking values are desired. The general form of 
the message sent from Si to its parent node at the end of 
epoch t is (i,t,£i ! t,Bi !t ,Hi i t,'Pi,t), where £ i>t , Bi. t , TCi.t, 
and Vi.t are the same as those defined in SRQ. Assume that 
Ci , ■ ■ • , Cfc S V are fc bucket IDs in the bucket data set whose 
ranking values are among the first k smallest ones. According 
to Bm.u -M can calculate the ranking values of bucket IDs 
in Bmx and, therefore, knows £1, . . . , To answer a top-fc 
query, (C,t,k), the storage node Ai reports the bucket IDs, 
Ci) • • ■ j Cfc> an d their corresponding encrypted data, along with 
H-M.t and Vm,u to the authority because it can be known that 
the data with the first fc smallest ranking values must be within 
Ci, ■ • • , Cfe- After receiving the query -result, the authority can 
first verify the authenticity of Vi.t by using Hi.t , and verify the 
query -result completeness by using Vi.t- Note that both of the 
above verifications can be performed in a way similar to the 
one described in Sec. IIII-Al Actually, after receiving Vi.t, the 
authority knows which buckets contain data and the amount of 
data. Hence, knowing u e g v and u g v , V<? € [1, d], Vf € [1, w], 
the authority can also obtain d, . . . ,(k- Afterwards, what the 
authority should do is to check if it receives the bucket IDs, 
Ci, ■ • • , Ck> an d if the number of data in bucket ( g /, g' e [1, fc], 
is consistent with the number indicated by Vi.t- If and only 
if these two verifications pass, the authority considers the 
received query-result to be complete and extracts the top-fc 
result from the encrypted data sent from M.. 

C. Securing Skyline Queries (SSQ) 

To support secure skyline query in sensor networks, in the 
following we first present a naive approach as baseline, and 
then propose an advanced approach that employs a grouping 
technique for simultaneously reducing the computation and 
communication cost. 



1 ) Baseline scheme: To ensure the data confidentiality and 
authenticity, as in the SRQ and STQ schemes, the sensed data 
are also encrypted by using the bucket scheme mentioned in 
Sec. IIII- Al I At the end of epoch t, each Si broadcasts its 
sensor ID, all the sensed data encrypted by key Ki.t, and 
the proper bucket IDs to all the nodes within the same cell. 
Then, according to the broadcast messages, each sensor node 
Si at epoch t has a bucket data set composed of the bucket 
IDs extracted from broadcast messages and the bucket IDs 
corresponding to its own sensed data. In fact, the bucket data 
sets constructed by different nodes in the same cell at epoch 
t will be the same. Treating these bucket IDs as data points, 
Si can find the set $ f of skyline buckets that are defined as 
the bucket IDs not dominated by the other bucket IDs. Here, 
since the bucket IDs are represented also by d-dimensional 
tuples, the notion of domination is the same as the one defined 
in the query model of Sec. HI] Define quasi-skyline data as 
the set of data falling into the skyline buckets. It can be 
observed that the set of skyline data must be a subset of quasi- 
skyline data. After doing so, each node can locally find 2 the 
quasi-skyline data, although there could be the cases where 
superfluous data are also included. At the end of epoch t, if 
Ki t is smaller than a pre-determined threshold, then Si sends 
its sensor ID and hashxi t ($t) to M.. Here, hash,K it {^t) 
works as a kind of proof so that it can be used for checking 
the query-result completeness. Note that only hash,K it (^t) 
needs to be transmitted to M. because M. also receives all the 
encrypted data and bucket IDs, and can calculate the quasi- 
skyline data by itself after message broadcasting. 

To answer the skyline query (C, t) the storage node reports 
the quasi-skyline data calculated at epoch t and the hash values 
received at epoch t to the the authority. Since the authority 
knows the threshold and Ki.o Vi € it will expect to 

receive hash values from a set of sensor nodes whose keys are 
smaller than the threshold. Unfortunately, due to the network- 
wide broadcast, this baseline scheme works but is inefficient 
in terms of communication overhead. Hence, an efficient SSQ 
scheme exploiting a grouping strategy is proposed as follows. 

2) Grouping technique: Like the baseline scheme, the 
bucket scheme mentioned in Sec. IIII- Al I is also used. Given 
a data set O and a collection {Oi, • • • , O t } of subsets of O 
satisfying (JJ=i Oj = O and Oj n O f = 0, Vj ^ f for 
t > 1. An observation is that the skyline data of O must be 
a subset of the union of the skyline data of Oj, Vj G [1,t]. 
Thus, the key idea of our proposed SSQ scheme is to partition 
the sensor nodes in a cell into groups so that broadcasting can 
be limited within a group, resulting in reduced computation 
and communication costs. In what follows, the SSQ scheme 
will be described in more detail. 

The sensor nodes in a cell are divided into /i disjoint 
groups, G v , V?7 G each of which is composed of \G V \ 

2 The design of an algorithm for efficiently finding the skyline data given a 
data set is a research topic, but is beyond the scope of this paper. We consider 
the naive data- wise comparison-based algorithm with running time 0(n 2 ) if 
the size of the data set is n, but each node, in fact, can implement an arbitrary 
algorithm for finding skyline data in our setting. 
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sensor nodes. The grouping needs to be performed only once 
right after the sensor deployment. Note that each group is 
formed by nearby sensor nodes and the grouping procedure 
is independent of the structure of the aggregation tree without 
affecting SRQ and STQ. Let cell-region be part of the sensing 
region monitored by one specified cell. For example, with the 
assumption that the shape of each cell-region is approximately 
a square, as shown in Fig. Q] and sensor nodes with uniform 
deployment are considered, the grouping can be achieved by 
simply dividing a cell-region into /i (= vN) sub-cell-regions. 
The sensor nodes in the same sub-cell-region form a group. 
Note that the square cell-region is assumed here for ease of 
explanation, but is not necessary for the grouping procedures 3 . 

At the end of epoch t, each sensor node Si broadcasts its 
sensor ID, its order seed 9 itt — hash,K it (i), all the sensed data 
encrypted with the key K i t , and the proper bucket IDs to all 
the nodes within the same group. After doing so, as in baseline 
scheme, the sensor nodes in the same group can locally find 
the quasi-skyline data from the bucket data set whose entries 
are generated by the sensor nodes in the same group. Let t 
be the set of skyline bucket IDs and 0, ; t their corresponding 
quasi-skyline data encrypted by the sensor nodes in group 77 
using proper keys at epoch t. At the end of epoch t, if Bit 
is among the first £1 smallest ones in the set of order seeds 
in group 77 at epoch t, where £i is a pre-determined threshold 
known by each node, then Si reports & v ,t> <fi v ,t, its verification 
seed hastiKi t ($,,,(), and the IDs of sensor nodes generating 
(f) v> t to Ai. In fact, £i = 1 is sufficient for the verification 
purpose. £i, however, is also related to the resiliency against 
sensor node compromises. Thus, we still keep £i as a variable 
and defer the explanation of the purpose of £i to Sec. [V] 
Here, the purpose of verification seeds is that the completeness 
of quasi-skyline data can be guaranteed by exactly £i sensor 
nodes for each group, while the purpose of order seed is to 
guarantee that at each epoch exactly £i sensor nodes will send 
the verification seeds as the proofs. 

To answer a skyline query, (C, t), the storage node Ai 
reports a hash of all the received verification seeds, h t = 
hashdli^rthashfCi t (&i.t)), where || denotes the bit-string 
concatenation and T t is the set of sensor nodes responsible 
for sending a hash value to Ai in each group at epoch t, the 
set of skyline bucket IDs, and their corresponding encrypted 
data received at epoch t to the authority. Since it knows the 
threshold £ x and K i>t , Vi G {1, . . . , N - 1,M}, the authority 
will expect to receive a particular hash value from Ai. If 
and only if the hash sent from the Ai matches the hash of 
the verification seeds calculated according to the knowledge 
of r t and Ki t by the authority itself, the received data are 
considered complete, and contain the skyline data. 



3 For example, the authority knowing the position of each node or the use of 
clustering algorithms can also divide nodes into groups. In general, after the 
localization, each sensor node can join the proper group possibly according 
to its geographic position when the grouping information such as the sizes of 
sensing region and cell-region are preloaded in sensor nodes. 



IV. Performance Evaluation 

We will focus on analyzing the critical issue of detecting 
an incomplete query-result in tiered networks. In this section, 
the detection probability and communication cost of query- 
result completeness verification in the proposed schemes will 
be analyzed. It is assumed that the number of hops between 
Ai and each sensor node is y/n for a collection of n uniformly 
deployed nodes [1]. In this section, both detection probability 
and communication cost are discussed at a fixed epoch t. 

As the communication cost of encoding approach [21] 
grows exponentially with the number of attributes, and some 
crosscheck approaches [29], [34] have relatively low detection 
probability, in the following, we compare SRQ with only 
hybrid crosscheck [34], which achieves the best balance be- 
tween the detection probability and communication cost in the 
literature. Note that, the parameter setting required in hybrid 
crosscheck is the same as that listed in [34]. 

A. Detection Probability 

The detection probability is defined as the probability that 
the compromised storage node Ai is detected if it returns 
an incomplete query-result. With the fact that the larger the 
portion of query-result Ai drops, higher the probability that 
the authority detects it, we consider the worst case that only 
one bucket and its corresponding data in the query-result are 
dropped by Ai and the number of data sensed by a node is 
either or 1 as the lower bound of detection probability. 

1 ) Detection probability for SRQ and STQ: To return an 
incomplete query-result without being detected by the author- 
ity, Ai should create a proof, (H-M.tiT-'M.t), corresponding 
to the incomplete query-result. Since bucket primes can be 
known by the adversary, VM,t can be easily constructed. 
Nevertheless, H.M,t cannot be constructed, since the bucket 
keys of sensor nodes generating the bucket dropped by Ai are 
not known by the adversary. Therefore, only two options can 
be chosen by the adversary. First, the adversary can directly 
guess to obtain TLm.i, with probability being 2~ ih , where £h 
is the number of bits output by a keyed hash function. This 
implies that the detection probability P^ i is 1 — 2~ lh for 
the first case. Second, knowing the aggregation tree topology, 
the adversary can also follow the rule of SRQ to construct the 
T~(-M,t without considering the bucket key of dropped bucket. 
Assume that the probability that a sensor node has sensed 
data is 5. The size of the bucket key pool can be derived as 
(N - 2) (26 + 1 - 6) + 2 = N6 + N - 26. Thus, the probability 
for the adversary to guess successfully is 2~ ik ( NS+N ~ 25 \ 
where is the number of bits of a key, leading to the 
detection probability V%™> is 1 - 2- e " ( - NS+N - 25 '> for the 
second case. Overall, the final detection probability, P d( ^ , 
is min{Pf^,P^}. On the other hand, as stated in Sec. 
IIII-B1 the STQ scheme is built upon the SRQ scheme. Thus, 
the detection probability, P^ Q , will be the same as Pf e ^ Q . 
As Fig. |2] depicts, the detection probability of SRQ is close 
to 1 in any case. However, hybrid crosscheck is effective only 
when a few sensed data are generated in the network. Such 
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a performance difference can be attributed to the fact that the 
sensed data in the network are securely and deterministically 
summarized in the proof of SRQ but they are probabilistically 
summarized in hybrid crosscheck. 




(b) 

Fig. 2: The detection probability of SRQ and hybrid crosscheck in the cases that (a) 
Y = 100 and (b) Y = 5000. 

2) Detection probability for SSQ: To return an incom- 
plete query-result without being detected by the authority, 
M. should forge a proof, i.e., a hash h t of all the received 
verification seeds, corresponding to the incomplete query- 
result. Therefore, only two options can be chosen by the 
adversary. First, the adversary can directly guess a hash value 
h t . The probability for the adversary to guess successfully is 
2~ lh , implying the detection probability is 1 - 2~ lh 

in this case. Second, the adversary can also follow the rule 
of SRQ to construct ht for incomplete data. In this case, the 
adversary is forced to guess £1 keys for each one of [i groups, 
leading to the probability of success guess being 2~ lkfJ -^ 1 and 
the detection probability being F^ t Q 2 = 1 - 2^ e "^ 1 . The 
final detection probability, Pff t Q , is thus, min{P^,P^}. 
Obviously, P det ^ will also be close to 1 when appropriate key 
length or hash function is selected. 

B. Communication Cost 

The communication cost, T, is defined as the number of bits 
in the communications required for the proposed schemes. We 
are mainly interested in the asymptotic result in terms of d 
and N because they reflect the scalability of the number of 
attributes and the network size, respectively. We do not count 
the number of bits in representing data, £i.t, since the sending 
of £i.t is necessary in any data collection scheme. We further 
assume that there are on average PdtobY data buckets, where 
< Pdtob < 1, generated in cell C. 



1) Communication Cost of SRQ and STQ: Each sensor 
node Si in SRQ is required to send Sj^H^t, and 7\ f to 
its parent node. Nevertheless, Bi t t,l~ti,t, and Vit can be 
aggregated along the path in the aggregation tree. As a con- 
sequence, Si actually has only one-hop broadcast containing 
Bi,t,T~(-i,t, and once at each epoch. In addition, to answer 
a range query, Ai is responsible for sending TLm,u *Pm,u 
and the bucket IDs in A. In summary, the communication 
cost, T SR Q, can be calculated as (N - 1)(4 + t P ) + 
PdtobY \\ogw]d\ogN + 4 + t P + \A\\logw]d = 0(N + 
dlogN), where £p is the number of bits used to represent the 
bucket prime Vm.i- Due to the similarity between SRQ and 
STQ, the communication cost, T ST< 5, can also be calculated 
as 0(N + dlogN) in a way similar to the one for obtaining 

jSRQ 




(b) 

Fig. 3: The communication cost of SRQ and hybrid crosscheck in the cases that (a) 
Y = 100 and (b) Y = 5000. 

As shown in Fig. [3] where the parameters ih = 80 and 
ip = 1000 are used, the communication cost of SRQ 
is significantly lower than that of hybrid crosscheck. More 
specifically, as the communication cost of hybrid crosscheck 
can be asymptotically represented as 0(N 2 + Nd) and will be 
drastically increased with N and d, the proposed SRQ scheme, 
however, exhibits low communication cost regardless of the 
amount of sensed data in the network due to the fact that the 
size of the proof used in SRQ is always a constant. Hence, 
the communication cost of SRQ will be dominated by the 
aggregation procedure, the average hop distance between M. 
and each node, and the transmission of bucket IDs, resulting 
0(N + dlogN) communication cost. 

2) Communication Cost of SSQ: Since grouping is only 
performed once after sensor deployment, we ignore its com- 
munication cost. Note that, in the following, the communica- 
tion cost of the data should be counted because it is involved 
in the design of SSQ. After the grouping, each Sj broadcasts 



g 



the data bucket IDs, sensor ID, and the order seed to all the 
nodes within the same group. Assuming that the nodes employ 
a duplicate suppression algorithm, by which each node only 
broadcasts a given message once, one node should broadcast 
a message with jsl d + Pdt ^ Y \\ogw~\d + lid + ih bits, where 
id is the average size of a datum and ii d is the number of bits 
required to represent sensor IDs, resulting in communication 
cost of Ci = \G v \ 2 (^i d + 2^\\ogw~}d + i id + 4) bits 
in each group. Let < p q < 1 be the average ratio of 
the quasi-skyline data to all the sensed data. \$, h t\ is equal 
to PdtobPqY on average. Then, once the the order seed is 
among the first £i smallest ones of the order seeds in a group, 
Si £ G n is required for sending &, h t, 4>r],u hashf^ t (& v ,t), 
its sensor ID, and the IDs of sensor nodes generating buckets 
in $ r) t to M., implying the communication cost of Ci = 
PdtobPqY \\ogw~]d+p q Yl d +l h +l ld + \G v \l ld bits in the worst 
case. Note that the verification seeds cannot be aggregated, 
although the verification seed can also be delivered along the 
path to Ai on the aggregation tree. Ai needs to send the 
skyline bucket IDs, its sensor ID, and a hash h t as the proof to 
the authority for answering the query. The communication cost 
of M is C 3 = iid + ih + i"X^=i PdtobPqY \\ogw~\d + p a Yl d . 
Consequently, the upper bound of communication cost, T SS Q, 
can be obtained as \iC\ + jU£iC 2 logiV + C 3 = 0(N% + Nd) 
when /Lt = y/N and \G V \ = y/~N, M-q e By similar 

derivation, the communication cost of the baseline scheme is 
0(N 2 d). Thus, exploiting the proposed grouping technique 
does reduce the required communication cost. The trends of 
communication cost in SSQ are shown in Fig. |4] 




(b) 

Fig. 4: The communication cost of SSQ for (a) Y = 100 and (b) Y = 5000. 

V. Impact of Sensor Node Compromise 

In the previous discussions, we have ignored the impact of 
the compromise of sensor nodes. In practice, the adversary 
could take the control of sensor nodes to enhance the ability 



of performing malicious operations. The notation s is used 
to denote a set of random sensor nodes compromised by 
the adversary. In collusion attack considered here, Ai col- 
ludes with 5 in the hope that more portions of query-results 
generated by innocent sensor nodes can be dropped. Since 
crosscheck approaches [34] suffer from collusion attacks, the 
impact of collusion attack on secure range query for tiered 
sensor networks was addressed in [34]. Their proposed method 
is random probing, by which the authority occasionally checks 
if there is no data sensed by some randomly selected sensor 
nodes by directly communicating with them. Random probing, 
however, can only discover the incomplete query-results with 
an inefficient but possibly lucky way and cannot identify s. 

On the other hand, in this paper, we identify a new Denial- 
of-Service attack never addressed in the literature, called false- 
incrimination attack, by which s provides false sub-proofs to 
the innocent storage node so that the innocent storage node 
will be regarded as the compromised one and be revoked. 
Unfortunately, all the prior works [21], [29], [34] suffer from 
this attack. In summary, with minor modifications involved, 
our proposed SRQ, STQ, and SSQ schemes are resilient 
against both the collusion attack and false-incrimination attack. 

It should be especially noted that, in the following discus- 
sion of SRQ, we temporarily make two unrealistic assumptions 
that the compromised nodes can only disobey the procedures 
of the proposed schemes 4 and the (subtree) proofs (defined 
later) will not be manipulated by the compromised storage 
and sensor nodes, in order to emphasize on the effectiveness 
of our proposed technique in identifying compromised nodes. 
Nevertheless, these two assumptions will be relaxed later. 

Impact of Sensor Node Compromise on SRQ and STQ. 
Under the above two assumptions, the SRQ scheme is in- 
herently resilient against collusion attack, because, regardless 
of the existence and position of s, the proper bucket keys 
will be embedded into the proofs and cannot be removed 
by s. Nevertheless, SRQ could be vulnerable to the false- 
incrimination attack since false sub-proofs injected by s will 
be integrated with the other correct sub-proofs to construct 
a false proof, leading to the revocation of innocent Ai. Here, 
we present a novel technique called subtree sampling enabling 
SRQ, with a slight modification, to efficiently mitigate the 
threat of false-incrimination attacks. The idea of subtree 
sampling is to check if the proof constructed by the nodes 
in a random subtree with fixed depth is authentic so as 
to perform the attestation only on the remaining suspicious 
nodes. Let m be a user-selected constant indicating the 
subtree depth. In the modified SRQ scheme, once receiving 

(j P ^£ 3p ,uB 3p , u n Jp , u r Jp .t^l v ...^l^), Jp g [i,n — 

1], Vp <G [l>x]> from its \ children, Sj li ... i Sj , each Si 
calculates £j ,t, Bj t t, Tij ,t, and Vj ,t as in the original SRQ 
scheme. Note that, if s, is a leaf node on the aggregation 
tree, it is assumed that s, receives (0, 0, 0, 0, 0, 1, 0, . . . , 0). In 
the modified SRQ scheme, however, s, additionally performs 

4 In other words, compromised nodes are assumed to not inject bogus sensor 
readings. They can only manipulate its own subproof and the proofs sent from 
its descendant sensor nodes. 
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the following operations. Assume that Hj t , PJ , £ i)J f , 
Vu G [0,m- 1], Vj p G [1,N - 1], and Vp G [l,*]."^ 
calculates P£ t = hash Ki>t {Y%=i H^~t + H i,t) and P£ = 
Ilp=i 'X*' Vw e I 1 '™]- s * also 'calculates P? t = H i>t 
and PP 4 = Pi t, where P^ and Pj, t are computed in a way 
stated in Sec. IIII-AI Then, H" t and P^ t are assigned to set 
i?" t , Vu G [0, m — 1]. If hastiKi t (i) < £2, where £2 is a 
pre-determined threshold known by each node and will be 
analyzed later, then Sj sends $™ to (possibly compromised) 
AL Let Tj" 1 be a subtree of the underlying aggregation tree, 
rooted at Sj with depth m. i9™ generated by s, can be thought 
of as the subtree proof of the data sensed by the nodes in X™. 
Finally, s t sends (i, t, £ iit , B ilt , Hi^ V^t, ■ ■ ■ , ^f 1 ) to 
its parent node. Let Wt be the witness set of sensor nodes 
satisfying hashxi t (i) < £2 at epoch t. The nodes in Wt are 
called witness nodes at epoch t. 




(a) (b) 



Fig. 5: The conceptual diagrams of identifying the compromised nodes, (a) Only the 
nodes in red area need to be attested, (b) Only the nodes in gray area need to be attested. 

We first consider the simplest case, where no compromised 
nodes act as the witness nodes. We further assume that only 
one compromised node (i.e., \s\ = 1) injects false sub-proof 
for simplicity and our method can be adapted to the case of 
multiple compromised nodes injecting false sub-proofs. We 
have the following observation regarding each witness node Sj 
and its generated $™ . Assume that the query -result is found to 
be incomplete at epoch t. The authority requests #™ for each 
witness node Sj stored in M.. To identify the compromised 
nodes, all the nodes at first are considered neutral. The nodes 
in T™ become innocent if the verification of i9™ passes, and 
the nodes in T" 1 become suspicious otherwise 5 . The above 
verification is performed as follows. According to the P™ in 
the subtree proof the authority knows which nodes in the 
subtree T™ contribute data and their amount. The authority 
can, based on this information, calculate P™ by itself. Define 
<?T m ,t as the set of data sensed by the nodes in T™, and Br m ,t 
as the corresponding bucket IDs of £-T m .t- As a consequence, 
the verification passes if and only if the P^ calculated by 
the authority itself is equal to the Hi, t extracted from the 
received received and the amount of the data in £r m ,t 
falling into the specified buckets in Br m .t matches the one 
indicated in P,;. r . As a whole, each time the above checking 
procedures are performed according to $™ at epoch t, nodes 
will be partitioned into three sets, innocent set 3 l t , neutral 
set, and suspicious set & t containing innocent nodes, neutral 

5 Note that there would be the cases that a node is deemed to be both 
innocent and suspicious when different subtree proofs are considered. When 
such a case happens, that node obviously should be innocent. 



nodes, and suspicious nodes, respectively. We can conclude 
that (DieWt sV0 ®t) LK-M} contains at least one compro- 
mised node injecting the false subproof if at least one & t is 
nonempty and ({si, . . . , sat_i}\ (\J ieWt T t )) Ul^l contains 
at least one compromised node injecting the false subproof 
otherwise. In more details, the authority at first only performs 
the attestation [23], [24], [26] on the nodes in DieWt sM0 ®t 
or in {si, . . . , sjv_i} \ (Uiew t *^t)- Nonetheless, after the 
attestation, if the nodes being attested are ensured to be not 
compromised, then Ai should be the compromised node. It 
can be observed that the size of the set of the nodes the 
authority needs to perform the attestation has possibility of 
being drastically shrunk so that the computation and commu- 
nication cost required in the attestation will be substantially 
reduced as well. It can also be observed that each time the 
attestation is performed, at least one compromised node can 
be recovered. The intuition behind the checking procedure can 
be illustrated in Fig. [5] In addition, Fig. [6] depicts the number 
of nodes to be attested after an incomplete reply is found 
in different settings. Since the number of witness nodes is 
approximately — l)/2 e '\ it can be observed from Fig. [6] 
that the larger the £2 and m, the lower the number of nodes 
to be attested. Nevertheless, when £2 and m become larger, 
the communication cost of the modified SRQ scheme, which 
will be presented later, is increased as well. 

There, however, would still be the cases where the compro- 
mised nodes luckily act as witness nodes so that they can be 
considered innocent by sending genuine subtree proof to M.. 
In our consideration, this case does happen, but our technique 
also successfully mitigates the threat of false-incrimination 
attacks because the effectiveness of false-incrimination attacks 
is now limited within the case where some compromised nodes 
work as witness nodes. 




(b) 

Fig. 6: The number of nodes to be attested for (a) N = 500 and (b) N = 1000. 
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Now, we have to remove the two unrealistic assumptions we 
made before, allowing that the bogus data can be injected and 
the subtree proof sent from the witness node has possibility 
to be maliciously altered on its way to the authority. After 
the removal of these two unrealistic assumptions, SRQ is 
still resilient against collusion attacks because the use of the 
proofs guarantees that the misbehavior of deleting the sensed 
data will be detected. Unfortunately, on the one hand, the 
compromised nodes injecting the bogus data can deceive the 
authority into accepting the falsified sensor reading. On the 
other hand, the compromised nodes lying on the path between 
M. and witness nodes can manipulate the subtree proofs so 
that the compromised nodes can avoid the detection and the 
innocent M. can still be falsely incriminated. In our strategy, 
we use the redundancy property, which have been widely used 
in the design of the other security protocols in WSNs such 
as en-route filtering, to mitigate the former threat, while we, 
motivated by the IP traceback technique in internet security lit- 
erature, develop a new traceback technique suitable for WSNs 
to resist against the latter attack. Here, the redundancy property 
means that usually WSNs are densely deployed so that an 
event in the sensing region can be simultaneously detected by 
multiple nodes. In general, the redundancy property can be 
obtained by achieving the so-called t-coverage and therefore, 
an event can be simultaneously observed by at least t nodes. 
In the following description of our remedy to these problems, 
due to its similarity to our modified SRQ scheme previously 
mentioned, we will omit some notational details, stressing on 
the procedures itself. 

When the redundancy property is used, in essence, our 
SRQ does not need to be changed. Assume for now that, the 
authority issues a range query and receives the query-result 
from Ai. In addition, we also assume that all the checking 
procedures stated in (modified) SRQ are passed. The authority 
now wants to know whether the received data are falsified by 
and sent from the compromised nodes. Recall that each node 
can be aware of its geographic position. After knowing which 
nodes contribute the sensed data to its issued query from the 
query-result, the authority further acquires the encrypted data 
of the neighbors of those nodes from Ai. Note that this can 
be achieved because when geographic positions of all nodes 
are known by the authority 6 , inferring the one-hop neighbors 
of a specific node can be easily achieved. Finally, for each 
node in the query-result, the authority checks the consistency 
of its sensor reading and the sensor readings of its one-hop 
neighbors. The sensed data will be rejected as long as it is 
inconsistent with the sensor readings of its neighbors. The 
consistency checking procedure here may allow for certain 
measurement errors or environmental factors, which should 
be domain-specific and user-defined. 

Now, we turn to address another problem that the subtree 
proofs transmitted from the witness node to the authority 

6 As long as each node knows its position, it sends in a multihop manner 
its position to the authority with the MAC constructed by the key uniquely 
shared with the authority. This kind of operations are only performed once 
after the sensor deployment. 



could be maliciously altered by the compromised (storage 
or sensor) nodes. To deal with this kind of threat, we need 
a mechanism, by which the receiver not only can know 
whether the received message is modified by the intermediate 
nodes, but also can point out the node modifying the message 
if it does exist. Motivated by the IP traceback techniques, 
we develop a recursive traceback mechanism. To be more 
specifically, each node s, on the path connecting the Ai and 
the witness node, after receiving D, where D denotes the 
subtree proof, from its descendant node, attaches hasIiKi t (D) 
to D and then forwards D\ \hashKi t (D) to its ascendant node 
on the underlying aggregation tree. Note that it is assumed that 
the witness node receives D\\%. Thus, with this modification, a 
subtree proof received by the authority should be accompanied 
with ip hashes, where ip is the number of nodes (including 
storage nodes and the witness node itself) between a specific 
witness node and the authority. Here, we should note that 
because the topology of the aggregation tree is known by 
the authority, when the subtree proof is sent, the IDs of 
intermediate nodes except for the ID of the witness node 
itself do not need to be attached 7 . Hence, when the query- 
result is deemed incomplete, before conducting the procedures 
defined in the subtree sampling technique to attest nodes, 
the authority checks whether the received subtree proof is 
maliciously altered by the intermediate node. In particular, 
assume that the subtree proof and its ip associated hashes, 
II ' ■ ■ \\h$, where hf, 1 < i < tp, is the hash calculated 
by the node that is (i — l)-hop away from the witness node 
and /if is computed by the witness node itself according to 
its asserted subtree proof, are received by the authority. After 
the reception of D||/if|| • • • \\hS, the authority checks the 
consistency of the hash backward; i.e., it first checks h®, and 
then h^_ 1 , and so on. If such a verification can be successfully 
proceeded all ip hashes, then the subtree proof D is considered 
to be intact. Otherwise, once the verification fails in, say, bj? , 
we can conclude that the subtree proof was altered by the 
corresponding node since the innocent node is not assumed to 
behave in such a way. 

Recall that the communication cost of original SRQ is 
0(N + d log AO- In me modified SRQ without those two 
unrealistic assumptions, each node Sj at epoch t is required 
to additionally send . . . , $™ , leading to O(N) com- 
munication cost. At epoch t, approximately £2 (A — l)/2^ 
witness nodes will send their subtree proofs to Ai. Never- 
theless, when the subtree proofs are sent to Ai, since the 
additional hashes will be added, its communication cost will 

be 0(N ■ (y/N + (VN - 1) + h 1)). As a result, the 

communication cost becomes 0(N 2 + d log TV). 

Basically, STQ can be regarded as a special use of SRQ. 
Thus, its resiliency against false-incrimination attack is the 
same as that of SRQ. Nevertheless, due to the nature of top- 
k query, the injection of the falsified sensor reading from 

7 The path from any sensor node in the aggregation tree to Ai is unique. 
The authority can infer the nodes the subtree proof traverses once it is aware 
of the ID of the witness node. 
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the compromised nodes will imply a significant query-result 
deviation. As a consequence, the resilience against collusion 
attack of STQ should be reconsidered and it is discussed in 
the extended version [32] of this paper. 

Impact of Sensor Node Compromise on SSQ. Recall 
that two aforementioned unrealistic assumptions are made. We 
first consider the resiliency against false-incrimination attacks. 
The simplest method is to introduce a parameter £3 < £1 
so that M. reports to the authority all the verification seeds, 
instead of the hash of them in original SSQ. After that, for 
each group, if at least £3 out of £1 verification seeds can 
be successfully verified, then the quasi-skyline data in that 
group are considered complete. Hence, the threat of false- 
incrimination attacks will be mitigated because the adversary 
is forced to send at least £1 — £3 + 1 false proofs, instead of 
single one false proof. With even one verification seed from 
a specific group failed to be verified, we can conclude that at 
least one compromised sensor node exists in that group. 

Now, we consider collusion attacks. In SSQ, both M. 
and sensor nodes only know the quasi-skyline data. Recall 
that quasi-skyline data are not necessarily the skyline data. 
Thus, even when there is more than one compromised sensor 
node in a group, the probability of successfully dropping the 
skyline data is actually small. More specifically, to drop the 
skyline data at a specified epoch, 5 should contain at least £3 
sensor nodes responsible for sending hash values in order to 
successfully forge a proof of incomplete quasi-skyline data, 
and at the same time should be fortunate enough to select 
groups whose quasi-skyline data contain skyline data 8 . 

Now, we consider both the collusion and false-incrimination 
attacks. To drop the skyline data, the only thing Ai can do 
is to drop the data of certain groups. Here, for simplicity, we 
consider the case where Ai drops the data of a fixed group G^, 
i] E [1, fj], in which a set s of x sensor nodes is compromised. 
To prevent the detection of incomplete query-result, £3 out of x 
compromised sensor nodes should be the sensor nodes respon- 
sible for sending the proofs. The probability that at least £ 3 out 
of £1 nodes responsible for sending the proofs are contained 

in sis Pl = Y§Lb { € j){ lG x-^)/( lG ^)- In other words > 
this is equal to the probability that the compromised sensor 
nodes can drop the quasi-skyline data without being detected. 
Quasi-skyline data, however, are not necessarily equivalent to 
the skyline data. The probability that the quasi-skyline data 
dropped by compromised nodes indeed contain skyline data 

can be represented as p 2 = ( |G Jy/n- Pc y)/(\gJy/n)> where 
p c is the average ratio of the skyline data to all the sensed data. 
In short, this is equal to the probability that the operations per- 
formed by compromised nodes cause the loss of skyline data. 
As a whole, even if the adversary has x compromised nodes in 
G v , the probability of successfully making skyline query-result 
incomplete is merely p\ ■ p2- The trends of such a probability 
are depicted in Fig. [7J under different parameter settings. As 
shown in Fig. [7a] pi ■ P2 is decreased with an increase of 

8 Definitely, M can simply drop all the sensed data. Nevertheless, under 
this option, it is forced to forge at least p(£,i — §3 + 1) proofs (/i = \fW in 
our analysis), leading to high probability of being detected. 



N and Y. This is because when N and Y become larger, it 
is more unlikely that the quasi-skyline data dropped by the 
adversary contains the skyline data. Nevertheless, as shown in 
Fig - EH Pi 'P2 is increased with an increase of (£1 —£3). This 
is because when (£1 — £3) becomes larger, it is more likely 
that s can forge a proof of an incomplete quasi-skyline data. 
As a whole, from the false-incrimination attack point of view, 
the larger the (£1 — £3), the lower the p\ ■ P2, but from the 
collusion attack point of view, the smaller the (£1 — £3), the 
lower the p\ -p2- This would be an optimization problem that 
deserves further studying. In addition, compared with original 
SSQ, the additional communication cost incurred from the 
modified SSQ comes from the transmission of verification 
seeds from Ai to the authority. Thus, the communication cost 
of the modified SSQ scheme remains 0(N% + Nd). 




<b) 



Fig. 7: The probability of successfully dropping skyline data in the cases that (a) 
£ 1 = 8 and £3 = 4, and (b) N = 500 and Y = 100. 

VI. Conclusion 

We propose schemes for securing range query, top-fc query, 
and skyline query, respectively. Two critical performance 
metrics, detection probability and communication cost, are 
analyzed. In particular, the performance of SRQ is superior 
to all the prior works, while STQ and SSQ act as the 
first proposals for securing top-fc query and skyline query, 
respectively, in tiered sensor networks. We also investigate the 
security impact of collusion attacks and newly identified false- 
information attacks, and explore the resiliency of the proposed 
schemes against these two attacks. 
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